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ABSTRACT 


We examine two categories of solution algorithms for the 
large-scale multicommodity transshipment problem (MCTP): 
resource direction and price direction. In the former 
category we construct RDLB, a new algorithm which uses a 
Simplified projection method in the subgradient capacity 
reallocations and conjugate subgradient directions with 
approximate line search to provide better termination 
conditions in the Lagrangean lower-bounding iteration. In 
the latter category, we develop DDC, a dual decomposition, 
and we introduce RSD(P)} and RSD(A), new non-linear 
decomposition algorithms for the MCTP based on penalty 
transformations of the original problem and using restricted 
Simplicial decomposition. 

Computational results are presented for four- and ten- 
product versions of a problem with an underlying network of 
3,300 nodes and 10,400 arcs. Results show RDLB stalls 
before reaching optimality, apparently a common problem in 
primal subgradient reallocations, while the RSD algorithms 
reach near-optimal solutions up to 10 times faster than a 
direct primal simplex-based solver, and display very 
favorable convergence rates compared to DDC. As a final 
test, RSD(A) and DDC are applied to a 100-product problem 


totaling 330,000 nodes and 1,040,000 arcs. RSD(A) reaches 


an acceptable solution within 432 of optimality in under 17 


minutes, while ppc terminates after 68 minutes with a 122 


gap remaining around the optimal solution. 
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I. NEROSUCTION 


Minimum-cost single-commodity capacitated transshipment 


problems are now solved routinely using primal network 


Simplex codes. See, for instance, (Bradley, Brown, and 
Graves, 1977). However, when multiple products flow over 
the same network, the pure network structure can be 


confounded by the presence of constraints limiting the total 
flow of all products on each arc. Thus, single-commodity 
solvers may not be applied directly, and, as the number of 
products increases, the size of the constraint matrix grows 
so rapidly that conventional simplex solvers become useless. 
Because ante is a frequently encountered  problen, 
specialized algorithms for the multicommodity transshipment 
problem (MCTP) have been widely studied and reported in the 
literature. However, none of the methods has been so 
generally successful that it dominates other methods, as 
primal network algorithms do in the single-commodity arena. 
This paper documents new algorithms for the MCTP which 
fall into the broad categories of "resource-directive" and 
"price-directive." The first algorithm is an enhancement of 
a popular resource-directive approach using subgradient 
optimization, but incorporating a Simplified primal 
projection together with conjugate subgradient directions in 


the Lagrangean lower bound steps. The algorithm promises 


ease of computation in the primal restriction and improved 
criteria for termination over previous algorithms, but 
apparently shares a common problem of subgradient-based 
resource-direction: stalling in the primal before reaching 
optimality. 

The set of price-directive algorithms we present 
includes a variant of dual (Dantzig-Wolfe) decomposition and 
a new family of decomposition algorithms uSing a penalty 
transformation of the original MCTP to create a non-linear 
master problem which is solved by restricted simplicial 
decomposition. This iS a unique approach to the MCTP not 
previously attempted which exhibits superior convergence 
rates in computational tests compared to other solution 
methods. The algorithm is quite general and therefore 
applicable to a wide variety of linear programs exhibiting 
complicating constraints. 

We introduce the MCTP in the following section, and then 
give an overview of major solution approaches to the MCTP in 
Section B, and a more detailed literature review in Section 
C. The resource-directive algorithm is presented in Chapter 
II and the price-directive algorithms in Chapter III. 
Computational results appear in Chapter IV and Chapter V 


presents conclusions and areas for future research. 


A. STATEMEN®D OF THE PROEBEM 
The general form of linear program in which we are 


interested is: 


(LP) min cx (duals) 
st A ix Ss Dy (uj) 
GPS 815) (U5) 
0 <= b3 
where one set of constraints, A»)x = bo, 0 < X & Boe 


deemed to be easy to solve, and a second set, Ax =a 
complicates the problem. The vector u = (uj,U9) 1s the set 
of dual multipliers associated with the constraints with 
uz < 0 and uy unrestricted in sign. 

The following notation will be used throughout this 
presentation. We assume all products flow over the same 
underlying network. Let |Z| denote the cardinality of a set 
bx 

T = {I,J} is a transshipment network with a set of nodes, 
I, and a set of arcs, J. 

P is the set of products flowing on T. 

1 « I is a node of the network. 

j ¢ J is an are in the network. 


pe P l# @ preduct using Neework @. 


Np is an (|I| x|J]) node-arc ‘incidence matrix for each 
produc. (a = «... = Nipy)- 
N is an (|1I|°|P|]) * (|J|°|P]) melterix with thé Np matrices 


along the diagonal, O's elsewhere. 
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c= peter is a vector of costs on the arcs, 


hanei, TaR sp 
Soe... , seal is a vector of flows on the arcs, 
length SP 


b, is a vector of joint capacities with length |J| 
b> = right hand side with 
bopi > 0 if product p has a supply at node is, 
bopi < 0 if product p has a demand at node i, and 
Doni = 0 otherwise. 


Rees a lvl) < fo |° ||P) melterix = {I,...,I1}. 


We specialize LP to MCTP by letting A> = N, by = by, and 


dropping the subscript on A}: 


(MCTP) min ex (duals) (eles) 
st Ax < by (4) el 2) 

Nx = bo (U2) (1.3) 

Ors Xp < Pueet allapee P . (1.4) 


The easy constraints are the single-product pure network 
flow constraints (1.3,1.4), and the complicating constraints 
are the joint capacitation constraints (1.2). We assume for 
notational simplicity that all arcs have joint capacities, 
although in practice only a subset have such restrictions. 


For convenience, let 
Fo= {(X|NX = b>, 0 < Xp < PoeecOmeall p « P} 


be the set of all feasible single-commodity flows with joint 


capacity constraints’ relaxed. In the absence of the 


diel. 


complicating these constraints (1.2), the MCTP decouples 
into a set of independent single commodity networks. 

The Lagrangean dual of MCTP with respect to the joint 
capacity constraints is found by placing these constraints 


in the objective with multipliers uj, < 0: 


(LR) max man L(x) = €xX — osGex Soo 
u ,<0 x>O0 
st oc eee 


According to duality theory, if x* solves MCTP, then L(uj,X) 


S cx™ for U, <s O and xX <™r, Furthermore, a solution 


(iy eee) to LR has (uy) oe = cx”. Thus, we may use LR 


to generate a lower bound on MCTP by fixing uy <90 amd 


solving the following problem: 


(R(uy )) min ce -— u,b) 
i 
st X¢€ F , 


and this bound will be tight if u, is chosen correctly. 
Solving the Lagrangean dual generally allows some 
constraints to be violated with penalty u,(Ax-b,). For any 


x « F, we denote the set of violated constraints as 


ge = Sh) eam, A4x > by5} ; ( eS) 


Ti 


Peo eR VIEW OF SOLUTTON TECHNIQUES 
Solution methods for large-scale linear programs may be 
divided into three broad categories: direct factorization 
or compact inverse methods, indirect resource-directive 
methods, and indirect price-directive methods. Factoriza- 
tion approaches exploit specific structure inherent within 
the constraints to produce a compact basis representation 
with which the steps of a primal or dual simplex algorithm 
may be performed. Direct factorization is not pursued in 
this study (see, for instance, Graves and McBride, 1976). 
The other two approaches, resource and price direction 
are decompositions which divide the original problem into a 
master problem and subproblems which exchange information to 
solve the original problem indirectly by iteration. 
The resource-directive approach solves the MCTP as a two 
Pare minimization: 
InLer amb el — (bke 
y20 x20 


st NX 


I 
oO 
NS) 


A 
o) 


me YY 
Ay = by 


Yp £ 5, for all pe P 


The vector y allocates joint capacities to the individual 
commodities. For fixed y, the solution to the inner 


Meme zacLon is 
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(RS (y) ) Viypo= min cx 
st Nx = bo 


Ome & ey 


which is the "subproblem" or "subproblems" since the 
individual commodities are no longer coupled and may be 
solved independently. 


The outer minimization can now be written as 


(RD) min V(y) 


st Ay = Dy 


Yp £ 5, for all pé« P 
Y» 280 
Standard methods fer solving (RD) are Benders 
decomposition and subgradient optimization. Benders 


decomposition creates a master problem which makes 
successive tangential approximations to V(y); the tangent 
planes are derived by solving subproblems (RS(y)) whose 
Capacity (resource) allocations are determined by the master 
problem. More details of Benders decomposition are given in 
Chapter III but the method is not pursued because the master 
problems become unwieldy. 

Subgradient optimization solves (RD) in a fashion which 
is analogous to a projected gradient algorithm which could 
be used if V(y) were differentiable. Given a feasible 
allocation yx in the kth iteration, a feasible reallocation 


is obtained by 
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where sX is a scalar step length and dy, is a feasible 
direction. Since V(y) is not everywhere differentiable, aX 
is a projected subgradient rather than a projected gradient. 
Standard subgradient optimization does not use a line search 
to determine s* since ak is not guaranteed to be a descent 
direction. However, we devise an algorithm which performs 
an approximate line search when applicable in the lower 
bound routine. Subgradients are determined via solutions of 
the subproblems RS(y); the master problem of this procedure 
1s the reallocation mechanism. 

We present price direction as a class of penalty 


problems, 


max min ext Pz ax—b7) CIS) 
Zee 
st x é€F 


where P(*) is a penalty term involving those constraints 
deemed to be complicating. The vector z is determined 
differently for various forms of P(°*), but in general is 


involved in constructing an optimal vector of dual 


multipliers. 
MEEting P(u;,Ax=b)) = -u,(Ax-b,), we see that (1.6) 
becomes LR, -zZ = u, estimates the optimal dual multipliers, 


and by fixing u, < 0, the inner minimization amounts to 


solving the Lagrangean subproblem LR(u}). 


TS 


Solution of the Lagrangean dual, i.e., (1.6) with linear 
penalties, cannot guarantee an optimal solution to MCTP 
although optimal u,~ and x™ for MCTP are in the set of 
solutions to LR. Nevertheless, optimizing LR is useful for 
bounding purposes and it iS sometimes possible to obtain 
good solutions to MCTP from LR. The method used in this 
work to solve LR is subgradient optimization. 

Subgradient optimization for LR simply updates the 


multiplier estimates at each iteration k, by the formula 
u,Xtl = min (0,u,* - sk(axk-p)) 


while controlling the scalar steplengths sk, resolving the 
Lagrangean to obtain a new xkt+l and seeking feasibility only 
indirectly via the penalty terms. The dual update mechanism 
is the master problem for LR while the subproblems are 
LR(u,*). 

Standard dual (Dantzig-Wolfe) decomposition may be 
interpreted as a special method of solving (1.6) with 


piecewise-linear penalty function, 


P(Z,AX=b}, ) = ae (Ax~bj ) 

and 2; ==” if (Ax-b))4 > 0 
Zi — 0 if (Ax-b}) 3 0 
Z5 EOr) eg fe aL = 0 


Z is determined by solving the master linear program of 


Chapter III.A, yxrel@ing Che aale uy — =z; Subproblem 
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solutions attempt to provide descent directions for this 
penalty function. 

P(*) may also take standard non-linear forms such as the 
quadratic function, 5h] (Q(x) |] 14, where z is replaced here 
by the scalar h, Q(x) = (A3x-b14)* = max(0,A4x-bj4), and 
|{|°|| denotes the Euclidean norm. Penalty function theory 
tells us precisely how to solve this problem: solve the 
inner minimization ash*+~. As h*@, x > x* and h(Ax-b,)?* 
converges to an optimal set of dual multipliers. 

However, starting with h large produces a highly ill- 
conditioned problem, so the problem is usually solved for a 
sequence of increasing values for h, producing a sequence of 
improving, but infeasible solutions. An augmented 
Lagrangean penalty function is investigated for reducing 
1ll-conditioning and speeding convergence. 

A reasonable approach for solving these nonlinear 
penalty problems is some feasible descent method. For fixed 


h, feasible descent directions are derived for cx + P(h,Ax- 


b) at x = x by solving the linear subproblem 


min vg(cx + P(h,Ax-b1))x 


st x € F 


to obtain x. The Frank-Wolfe algorithm (1956) would perform 
a linear search in the direction x-x to obtain a new x and 
iterate. This type of master problem tends to have poor 


convergence in practice so we employ Restricted Simplicial 
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Decomposition (Hearn et al., 1984) which maintains a 
restricted set of extreme points of F together with x. 
Instead of solving a simple line search the master problem 
solves a nonlinear program over the convex hull of the 
retained points. The loss of simplicity in the master 
problem is typically offset by improved convergence of the 
overall algorithm. 

There is no strong rationale for replacing a hard linear 
problem with a seemingly more difficult nonlinear problem. 
Consequently, this approach has not’ previously been 
considered for large-scale applications. However, we show 


in Chapter III that this approach can be attractive. 
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C. SURVEY OF RELATED LITERATURE 

This section reviews the literature which has led to the 
current state of the art in algorithms for the MCTP. We 
begin with a brief overview of single-commodity network 
algorithms because of their computational importance in many 
algorithms for the MCTP. We then mention prior 
contributions to a few special cases of the MCTP, and 
finally review literature on algorithms used to. solve 
general MCTPs. 

Single-commodity network flow problems have been widely 
studied since the 1940s, for two primary reasons: they are 
frequently encountered and their special structure lends 
itself to algorithms which are more efficient than general 
linear programming techniques. The constraint matrix of the 
pure network problem 1s a node-arc incidence matrix: all 
O's and +t1's, with at most one +1 and one -1 per column. 
This particular structure has three desirable properties: 
total unimodularity, which guarantees integer primal 
solutions given integer right-hand sides, and integer dual 
solutions given integer costs, a basis matrix which may be 
triangulated by permutation of rows and columns and thus 
simplifying computation, and primal extreme point solutions 
equivalent to spanning trees in the network. Several 
algorithms were developed in the 1950s and 1960s which made 
at least some use of these properties. Primal simplex 


methods were proposed by Dantzig (1963) and Fulkerson and 
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Dantzig (1955). Primal-d@Wal methods Became popalan an 
practice at that time, including Kuhn's Hungarian Method 
(1955) and the out-of-kilter algorithm of Ford and Fulkerson 
(1962). 

Primal-dual methods were favored throughout the 1960s, 
but some works, most notably the basis-labelling scheme of 
Johnson (1966), set the stage for research which led to the 
efficient primal network algorithms in use today. 
Subsequent research focused on efficient data structures and 
their Manipulation, which led 0 compact basis 
representations and efficient performance of the simplex 
pivot. Significant research was done in the 1970s by 
Srinivasan and Thompson (1972,1973), Glover, Karney and 
Klingman (1974), Glover, Karney, Klingman and Napier (1974), 
Glover, Klingman and Stutz (1974), Barr, Glover, and 
Klingman (1979), and Bradley, Brown and Graves (1977). 
Results presented by these researchers demonstrated the 
primal network simplex to be up to two orders of magnitude 
faster, to require much less memory than general simplex 
solvers, and to be about 40% faster than out-of-kilter 
codes. 

This research has given us network codes, such as GNET 
(Bradley, Brown and Graves, 1977), which can solve large 
(say, many thousands of nodes and arcs) single-commodity 


network problems very efficiently. This research is doubly 
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important for the MCTP since it allows efficient subproblem 
solution in many of the algorithms developed for the MCTP. 

Some of the pure network structure found in the single- 
commodity network problem carries over into the MCTP. 
Single-product networks do appear as blocks along the 
diagonal of the constraint matrix, but, unfortunately, the 
joint capacity constraints couple the networks together and 
generally destroy total unimodularity of the constraint 
matrix, admitting fractional solutions and requiring real 
arithmetic. It is appealing to try to restore total 
unimodularity or product independence by transformation. 
Some success in this area has been reported by Evans 
(1976,1978a,1978b,1983) and by Evans, Jarvis and Duke 
C1977). The class of problems for which their methods work 
1s quite restricted, however: total unimodularity is shown 
to hold when the number of sources or sinks for each product 
is less than or equal to 2, and the existence of 
transformations to single-product networks is shown when the 
number of sinks per product equals 2. ‘Thus, this is of 
little help to the general MCTP. 

Algorithms for the general MCTP are broadly classified 
as direct methods which exploit special structure using the 
simplex algorithm or indirect methods which use some form of 
decomposition. The indirect methods include price-direction 


and resource-direction. 


ZA. 


We first review the development of price-direction. 
Recognizing that the number of extreme points in a master 
problem may be huge, Ford and Fulkerson (1958) devised a 
procedure for the multicommodity maximal flow problem which 
uses column generation to produce favorable extreme points 
aS needed. In their formulation of the problem, the 
variables correspond to chains or paths through the network, 
and the constraints represent individual arcs. Dantzig and 
Wolfe (1960) formalized the procedure into the dual 
decomposition procedure in which the master problem is 
solved to provide pricing information to the subproblems. 
Solving the subproblems produces an extreme point, which is 
added to the master problem if it is favorable, or indicates 
optimality of the current master problem solution if it is 
neice 

Tomlin (1966) first formulated and implemented dual 
decomposition for the minimum cost MCTP, showing the 
equivalence of the Ford-Fulkerson algorithm to the algorithm 
described by Dantzig and Wolfe. Others have developed 
extensions of the MCTP using the  dual-decomposition 
approach. Cremeans, Smith and Tyndall (1970) and Wollmer 
(1972) formulated a model in which flow on some arcs depends 
on the availability of resources which are shared with other 
arcs. Weigel and Cremeans (1972) extended the model to 
allow the flow of each commodity to be measured in distinct 


units, joint arc capacities in common unite, SG ae 


me 


incorporate node capacity constraints. Swoveland (1973) 
presented a generalization of the MCTP which involved a 
decomposition in which a single subproblem is solved in two 
stages. 

Considered an effective technique in early iterations, 
dual decomposition has a reputation for poor convergence, 
with progress toward optimality tailing off in later 
iterations. Ho (1984a) speculated that this failure to 
converge is due to numerical error in computer 
implementation. Ho and Loute (1983) compared solution times 
of several problems for a commercial linear programming 
package and decomposition codes. They concluded that 
standard LP is more efficient when it 1s practical, but 
pointed out that decomposition is nearly as efficient for 
the large problems tested. Ho (1984b) further speculates 
that efficiency of the decomposition increases with problem 
dimension. 

Simplicial decomposition, an indirect price-directive 
method applicable to non-linear quasi-convex programs was 
introduced by von Hohenbalken (1977). Convergence of a 
restricted version of simplicial decomposition was recently 
demonstrated by Hearn et al., (1984). 

We now turn to resource-direction, which solves the MCTP 


using the subproblem 


eo 


RS (y) V(y) = min cx 


st Nx = D5 


and the master problem 


RD min V(y) 
st yw i) 
Yop £ 5; for all p< P 
ye 200 


Because the master problem provides allocations feasible in 
MCTP, RS(y) produces an upper bound on MCTP. When RS(y) has 
a feasible solution, it is feasible in MCTP. It is a simple 
matter for the modeller to introduce explicit artificial and 
Slack arcs with penalty costs to insure that any allocation 
will produce a feasible solution to MCTP. 

Resource-directive algorithms proceed by iteratively 
solving the restriction, and then updating the allocations 
in some favorable manner and resolving. Geoffrion (1970) 
provides a good overview of the resource-directive approach. 
The predominant method for computing new allocations iS via 
subgradient directions, which are a generalization of the 
gradient for convex OF concave functions with 
nondifferentiable points. This method is applicable since 


V(y) 18 a piece-wise linear, convex function. 
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The subgradient approach appeared first for solving sets 
of linear inequalities (Agmon, 1954; Motzkin and Schoenberg, 
1954), and then appeared in the Russian literature adapted 
to optimization problems, e.g., Polyak (1967). The 
optimization version was first applied in the Western 
literature to the travelling salesman problem by Held and 
Karp (1970,1971), and further developed as a general method 
for optimization by Held, Wolfe and Crowder (1974). The 
convergence rate and stepsize considerations were studied by 
Goffin (1977) and Bazaraa and Sherali (1981). Fisher (1985) 
summarized the subgradient approach to solving the 
Lagrangean dual. 

Resource-directive algorithms for the MCTP' using 
subgradients in the direction-finding process for optimizing 
RD were developed by Kennington and Shalaby (1977), 
Rosenthal (2918 3) and Allen (1985). Ali, Helgason, 
Kennington, and Lall (1980) declare their resource-directive 
algorithm to be faster than either a price-directive or a 
special basis factorization code in their computational 
tests. : 

While subgradient resource-directive methods are 
considered exact (Kennington and Helgason, 1980), zigzagging 
in subgradient methods is known to be a problem (Sandi, 
1979), so convergence to optimality is frequently slow at 
best. Thus, the reported computational advantage of 


resource direction using subgradient updates is based on its 


Zo 


purported ability to reach an acceptable near-optimal point 
before simplex-based methods reach optimality (Ali et al., 
1978, 1980; Kennington and Shalaby, 1977). 

The final major category of algorithms for the MCTPaite 
basis factorization or compact inverse methods. Algorithms 
in this category employ a primal or dual simplex approach, 
but exploit the special structure of the LP basis for the 
MCTP to reduce the size of the basis inverse that must be 
carried expaic aia. A dual partitioning method was 
presented by Grigoriadis and White (1972). Graves and 
McBride (1976) discussed the primal approach in the general 
context of mathematical programming. Hartman and Lasdon 
(1972) presented a theoretical primal approach based on the 
Generalized Upper Bounding (GUB) technique of Dantzig and 
Van Slyke (1967). Incorporating graph theory, they factored 
the MCTP basis into a network basis for each product and a 


"working basis" with dimension equal to the number of 


currently binding joint capacity constraints. Several 
efficiencies result. Primal network simplex techniques 
apply to computation on pure _ network bases. Only 


computations involving the working basis require real 
arithmetic, and this basis is generally quite small. 
Kennington (1977) implemented the method and further 
study has been done by Helgason and Kennington (1977). Ali, 
Barnett et ale, (£964) eone1uaee that the GUB approach is 


three to five times faster than standard LP codes, but other 
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studies generally rate resource-directive 


Superior to the CUB approach 


algorithms 
(see, for instance, 2S ee 


awe and Kennington and Shalaby, 1977). 
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II. A RESOURCE-DIRECTIVESAPPROACH. TO ii MET 


This chapter presents a resource-directive method for 
solving the MCTP which incorporates a simplified projection 
mechanism in the primal subgradient capacity reallocations 
and conjugate subgradient directions with approximate line 
search in a Lagrangean lower-bounding routine. The 
procedure is based on the method of Allen (1985) which 
solves two essentially independent sequences of problems, 
the resource-directive sequence co generate feasible 
solutions and upper bounds, and a Lagrangean sequence to 
generate lower bounds. The purposes for these enhancements 
to Allen's procedure are to provide a computationally- 
supportable, theoretically sound proj] ection in the 
reallocation routine, to provide a better mechanism for 
termination of the procedure, and to attempt to generate 
ascent directions in the Lagrangean routine to make line 
search worthwhile. We first restate the general form of the 
problem to be solved, and then introduce the concept and 
required theory of subgradient optimization. At that point 
we will be able to state clearly the shortcomings of 
previous approaches and present the procedure based on our 
improvements. 

The resource directive approach attempts to solve the 


problem 
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fee mine Cx et) 


y x 
She Ax = bj (2.2) 
Nx = bo (2.3) 
OF < Yo < boymeectomeall p ¢« P (2.4) 
> ®, bZ.. 2} 
eae (2.6) 
Byesceleccting a particular y « Y, 1.e., a set of capacity 
allocations satisfying (2.2) and (2.4), the remaining 
subproblem is 
V(y) = min cx (duals) 
st Nx = bo (U2) 
x < Yy (U3) 
Oh. 


which is a set of restricted, single-product transshipment 
problems. The original problem is then {min V(y) st y <« Y}, 
so the outer minimization forms a master problem which seeks 
improving capacity allocations. ss 

Before turning to the subgradient approach to solving 
this problem, we briefly mention another approach to this 
problem, “Benders decomposition." First, we make the 
Simplifying assumption that all problems have _ finite 
feasible solutions and recall the theorem of quality to 
establish that the optimal values of a linear program and 


its dual are equal. Now the dual of our  subproblem is 
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V(y) = Max U5D> Sey (2. 


A 
Q 


st U5N suo 


V 
© 


Up ieee, Uae 


and letting the constraint set be represented by its extreme 
points, {Use}, {Use}, where e e« E, the index set of extreme 


points, it may be rewritten 


MaX UgeD9 - UzeyY 
= 


SUbStICUuting fer V(y) in the master problem yields 


{Min max(Uxebo - Uzey) st y ¢« Y} , 
e 


or, equivalently, 


st Z2e> U5ee> =e y e € E 


Ay = 25] 
Os ys by 
This is the Benders master problem. In practice, the 


extreme points of the subproblem are not all known, so we 
generate them by iteratively solving the subproblems to 
produce a new extreme point, and then add it to the master 
problem in th@® form of a constraint. The master, in turn, 
is solved to produce a new allocation. This is the process 
of the Benders (1962) decomposition algorithm which treats y 


aS a complicating variable. A full presentation of this 


a0 


method, which also considers conditions of unboundedness, 
may be found in Lasdon (1970). The approach is not pursued 
here because of the large size of the Benders master 
problem. 

The subgradient approach attempts to solve the master 
problem by simple updates of the capacity allocations at 
each iteration, ykt1 = Pry - skqak}, where Pr indicates a 
projection such that ykt1 «e Y, ak is a subgradient and sk is 


from a scalar step sequence, (sk) satisfying 
) sk = ae sk > 0, and sk +o0as k + ~, CZ. 3) 


Polyak (1967) demonstrated that these projected updates 
converge in the limit to an optimal solution simply by 
assuring that (2.8) is met. Letting yk+1 = yx + skgk | the 
projection finds a feasible yXtl = argmin({}|y-ykt1))2 st 
VY ace Yuh 

The subgradient itself is a generalization of the 
gradient for a function with nondifferentiable points. If f 
is a convex function with nondifferentiable points defined 
on a feasible region F, then d is a subgradient of f at 


7 « Yemef 
f(z) > £(2) + d'(z-z) for all z¢«F 


The set of subgradients at a point is called _ the 


subdifferential, defined by 
a9f(z) = (dl/f(z) > f(z) + a'(z-z) for all z ¢F} ; 


oy 


when z 18S a aq@ifferentiable point of ff, odf(z) = VE(z)ye 
Furthermore, there exists a set of subgradients which are 
the limits of the one-sided directional derivatives at z. 
These particular subgradients, referred to as "primitive 
subgradients," may be used in convex combinations to 
generate any member of the - subdifferential. This 
presentation 1s made in greater detail by Rockafellar (1970) 
and Sandi (1979). 

Demjanov (1968) demonstrated that at any nondifferenti- 
able point of a convex function f, there exists a 
directional derivative -d, = f£'(z) « d£(z) which J6.3gee 
locally best direction of descent. That direction is the 


minimum norm of the subdifferential; that is, 
d, = argmin {lla} |? st g ¢« deze 


which is the point of the subdifferential closest to the 
origin. Furthermore, a point z* is an unconstrained optimum 
if andwonlyaat sara= £'(e") = 0. 

Although using the minimum norm may be attractive as a 
descent direction, the required primitive subgradients are 
usually not all Known; the usual approach is to find a 
Single subgradient at each iteration and update the 
allocations using it along with a step sequence satisfying 
(2:80 The method works because.any element of the current 
subdifferential forms an acute angle with the true direction 


to the optimal point, so by taking a step of the appropriate 
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length, we move pointwise closer to the optimal solution at 
each iteration. However, it is not necessary for the 
objective function value to improve at each iteration. The 
sequence achieves an optimal point when a zero-subgradient 
is encountered (Held, Wolfe, Crowder, (1974)). 

The method is attractive due to its extreme simplicity 
and is commonly used as a mechanism for multiplier 
adjustment in Lagrangean relaxations (e.g., Fisher, 1985) 
and for capacity reallocation in resource-directive 
algorithms for the MCTP (see Kennington and Shalaby (1977), 
Rosenthal (1983), or Allen (1985)). 


The choice of stepsize sequence is critical to the 


success of subgradient methods. For instance, the harmonic 
series, sk = 1/k, satisfies (2.8) but exhibits poor 
convergence in practice (Bazaraa and Sherali, 1981). 


Convergence has also been demonstrated for 
eeu (a) )9)// I, (4~! | 


where 0 <n < 2 is a scalar multiplier, v*~ is the optimal 
objective function value, and gk is the norm of the 
current subgradient (Polyak (1967)). Since v* is generally 
not known, several researchers (for example, Kennington and 
Shalaby (1977) and Bazaraa and Sherali (1981) have replaced 
it by approximating values such as upper bounds or 
combinations of bounds. Goffin (1977) discusses a geometric 


stepsize progression which exhibits quadratic convergence, 


3:3 


but may not achieve an optimal point. Some good experiences 
have been reported for these methods, but these "heuristic" 
stepsizes frequently produce poor convergence, or, worse, 
may converge to non-optimal solutions. It is common 
practice to include in the process a heuristic rule for 
modifying the value of n when progress is slow. 

We now present the specific application of the 
subgradient approach to the MCTP, first to the resource 
direction routine in Section A, then to the Lagrangean 
routine in Section B, and finally present the overall 


procedure in Section C. 
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A. GENERATING UPPER BOUNDS VIA RESOURCE DIRECTION 
We use the subgradient approach in resource direction to 
perform the update, yXt1! = prfy* + skak} for yk,yk+l « y, to 


solve the problem min V(y) st y « Y where 
Y 


V(y) = min cx 
st Nx = b> 
x < Y 


x > 0 


Notice that changes in capacity allocations act as 
parametric changes to the right-hand side of the subproblen, 
so V(y) is convex with respect to changes in y: a proof of 
this 1s given in Kennington and Helgason (1980). 

To identify an appropriate subgradient, we first recall 
(2.7). For any allocation y producing a feasible solution, 
we have V(y) = usb5 - u3y, where W>anag =u, solve the dual 
program. The following proof from Kennington and Helgason 


(1980) shows that -u3 is a subgradient of the function V at 
ve 
Lemma 2.1: Let y > 0O be any feasible allocation to V(y) 


and (uU9,U3) be the associated optimal dual 


solution. Then -u3 is a subgradient of V at 


ve 


Proof: Let jy, y e Y be any feasible allocations, with 


Sptimale dual Solutions (5,03), (U2,U3;) In V(y), 


Ss 


V(y), respectively. Then 


as a as 


V(y) - V(y) = Ugbg - u3y - (ugb2 - u3y) > 

U5b5 = U3 - (Upgb5=Usy) = =U5 Gay or 

V(y) > V(y) - u3 (y-y) SO -U3 1S a subgradient of Vv 
at YY. “One. 


Therefore, a subgradient is directly available from the 
Subproblem at each iteration as the negative of the optimal 
dual uz, for aikbecationm- 

If -u3z were a feasible direction, the subgradient 
reallocation would be ykt1 = yk-sku3k, However, since -u3 
generally yields infeasible allocations, we project the 


infeasible reallocation y = y*-sku,k to a feasible 


allocation, ykt1 by solving the quadratic program 


min || (y**t1-y) | | 


st yktl e Y 


An equivalent form of this problem is 
min y (Yes Sips st ykt1l Ne 
Pp J 


which decomposes on j}. Solving a quadratic program for each 
jointly constrained arc at each iteration is a burdensome 
task, so a heuristic projection is generally used which does 


not involve the quadratic functions. 
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Unfortunately, computational results of Allen (1985), 
for instance, indicate that the method is unable to obtain 
near-optimal solutions on even moderate-sized problems, even 
when the bulk of the solution time iS spent on the primal 
reallocations. 

Two factors seem to contribute to this failure. First, 
the nature of the subgradient itself is that no primitive 
element of the subdifferential 1s guaranteed to be an 
improving direction. Second, the space of the primal 
variables is relatively large, complicating the problem. 
That is, if Jp = are Xp" = b15 se optimal in MCTP}, then 
in applying the supemtaient approach to the Lagrangean 
problem, we work with O(|Jp|) dual variables at _ each 
iteration; in the primal resource allocation, we make 
O(|Jpi°|P]) reallocations. Thus, as the number of products 
grows, the master problem becomes substantially harder. 

We implement a projection method which ignores 
0 < Yp < bi and maintains the bounds externally by, in 
essence, a simple minimum ratio test. The resulting 
projection can be solved analytically, and applied in 
practice through a simple set of calculations. Therefore, 
although it does not relieve the previously mentioned 
shortcomings of the subgradient method, it does not add to 
them. Furthermore, it preserves theoretical convergence of 


the method for a stepsize sequence satisfying (2.8). 


oy 


If we let y* and u3X be the allocation and due 


varlables obtained at iteration k, the infeasible 


reallocation y and the feasible (projected) allocation 


yxtl, then we may express the last two allocations as 
y = yk - sku,k 
and 
yktl = yk — sky,k 
The resulting quadratic program is 
min | |yk*1-y| |? 
st AykKt1 = Dil 
Recognizing that yktl-y - ~sku,k tr sky3k, ehyeart. Ay = bi, and 


that sX is merely the scale factor, an equivalent problem is 


min | Ju3K-u3%] | 2 


st Au3* = 70 


For a single constraint j, dropping the iteration count, 


the resulting problem is 


~ 


min U35'U35 - 35 'u34 


st 1'u35 = 0 
The Kuhn-Tucker conditions for this problem are 


“~ 


Ugj ~ 43; 7 ewe (2.2) 


Sys. 


where w is the dual variable associated with the constraint. 
Premultiplying by ic recognizing that 1'u3; = 0 and solving 
we find w= -(1'1) “11 'u35. Substituting into (2.7) and 
rearranging, the familiar projection 


> > 


> > 
u34 = (I ~ 1(1'1)711')u35 


> > 


is obtained. Observing that (1'1)71 = 1/]P| ana rewriting 
to summation form, we see that the projected subgradient for 
the p-th product on arc j is 


U3p3 = CC IPI—-1)U3p5/1P1) - mm U3p'93/7I1PIl = U3pj57435 
p'Ap 


where 


u35 = ) U3p3/1PI 
p 


eae <i 


The associated reallocation update is Yp Yp 
sK(u3p-U3), where sk is from an appropriate stepsize 
sequence, (sk), 

Since the bounding constraints oere ignored, Ypj < 0 or 
Yp} > Uj may result. In this situation we perform a minimum 
ratio test, setting 0 < s;' < s;* such that 0 < ypj < bj; 
for all p on are j. We drop all products with Xpj = 0 and 
U3p3 < U34 on arc j, recomputing 35 and reallocating among 
the remaining products to Simplify the process 


computationally. In addition, if some Xp = bs with U3p'5 


= min(U3p34), no reallocation is performed for that arc. 
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Finally, 12 6 p50= 33 for’ all p on sofie atc j, S@@hG@iEgs 
reallocation@eccuHs *oneehaweane: 
We write the procedure as follows, given xk u3* from 


the current subproblem, and a stepsize sk; 


REALLOT: Compute u34* = (Aju3)/| P| 
{For eachwarse, a 
Cond 1: If there is a4 p' with Xp'5 = D135 


and Uzp'43 = min(U3p5), 
P 


let ypj*** = yp3* 
Cond 2: If uzpjy = 343 for all p, 


k+1 — JK 
= Yipa 


let Ypj 
Cond 3: Identify Py = {(P|Xpj = 0,U3pj 2 35} 
If Py ¥ 9, recompute 


U34 = ) U3p97 (1 P lel Bil) 
J péPy PJ 


Over all p @ Py 
set Ypj\"" = Ypj* — $4%(u3p3-039) 
such that 0 < i < sk and 
<_< by; for all p « P 


Elia, Wor } 


Subgradient reallocation need only be performed for the 
set of joint arcs which are currently tight. For all other 
arcs, if there is some Xpj= Ypj with favorable reduced 
costs, but A4x és bi; for that j, we perform a simple 


reallocation in the manner of Rosenthal (1983), taking slack 
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from any p' with Xp'5y < Yp'3 Ueto sceall used; Thus, in 
the solution process, all available slack on a joint arc j 
is offered to each product until it is consumed, at which 
point that arc becomes a candidate for subgradient 
reallocation. 

In practice, we first perform simple reallocations until 
no further improved solutions are _ found, and then 
incorporate subgradient reallocations into the ensuing 
mMeSrations for tight joint constraints, The set of tight 
constraints is updated with each pass through’ the 
subproblems. We note that if at any iteration k, there is 
no arc j to which a simple reallocation may be applied and 
all arcs with A;x = bij also have uzp3 = u33 for all p, we 
have achieved an optimal point (Rosenthal, 1983). 

Unfortunately, the optimality condition is not 
frequently achieved, nor is a useful lower bound available 
from the dual information of the restricted problems, so we 
establish lower bounds on MCTP through a Lagrangean 
relaxation routine in hopes of a ra through bound 
convergence. The Lagrangean approach is explained in the 


following section. 
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B. LAGRANGEAN RELAXATION USING CONJUGATE SUBGRADIENT 
DIRECTIONS 


Lagrangean relaxation is a frequently-used device to 
assist in solving linear programs in which a set of 
complicating constraints are placed into the objective 
function with penalty multipliers which are estimates of the 
optimal dual multipliers to the original problem. Recalling 
Chapter I, the problem becomes 


LR Maxgemien Cx - U7 (Ax=b)) St =x gee 
u,<90 x>0 


which is frequently solved by fixing uj < 0 and solving 


LR (uj) V(uz,) = min(c-u,A)x + u,b, 


Sti, Meee He 


to yield a lower bound on the original problem. A new uj, is 
then found and LR(uj,) resolved, repeating until optimality 
1s achieved. 

A common method of accomplishing the u, updates is by 
subgradient optimization, where g = (Ax-b,) is a subgradient 


and 
u,ktl = min(0,u,%-skgk) 


is the update. The "min" operator solves the projection 
min||u,**1-u,||2 st u,**! < 0 (see for instance, Fisher, 
1985). However, the method has several drawbacks. First, 


due to the nature of the subgradient, the bound i Gee 
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always improved, and due to the sensitivity of the process 
to eho) Cemot sk, convergence may be slow, or the process may 
converge to a non-optimal point. Due to the relaxation, it 
is not necessary that a primal feasible solution ever be 
generated and, since we only determine one primitive 
subgradient at each iteration, we are unlikely to find a 0- 
subgradient and thus recognize an optimal solution. 

In this section we introduce a procedure which seeks to 
overcome some of these difficulties by retaining information 
on previous subgradient directions in order to approximate 
the subdifferential at the current point. The concept, 
developed originally by Wolfe (1975) involves computing a 
"conjugate subgradient" direction at each iteration which is 
the nearest point to the origin of a set of m retained 
subgradients. The method relies on the property that the 
subdifferential of any point can be arbitrarily well 
approximated by the convex hull of gradients of the points 
in its immediate neighborhood (see Rockafellar, Loy On ect or 
details). The algorithm generates a sequence of points 
(u,*} and subgradients (gk} and computes a search direction 


aK as the solution to 


(CD) min Ital |? 


for some integer m. At some step of the algorithm, some 
k Kime. Kem, 
subsequence of points, (Uj,%*,Uj pe +o pT } 1s close enough 


together that 

y£(zK) = conv{gk,gk71,...,gk-™) 
and 

ak = 0 


When this occurs, we are sufficiently close to the optimal 
solution to terminate with u,* as an approximation for uj”. 
Wolfe's algorithm calls for a single variable search to 
optimize the objective in the conjugate subgradient 
direction; this 1s computationally expensive since the inner 
minimization involves solving the set of network subproblems 
for various step lengths. So, we finda sx approximately. 


Define a nominal stepsize s, as 


Ss, = n(V-V¥)/1 IgX| | 


where V1gX] | lis the Euclidean norm of the current 
subgradient, and n is a scalar multiplier. V is the current 
upper bound, and V is the current lower bound. Then we 
evaluate W(n) = V(u;* + sdk} for n= 0,1, and 2. Singe th 
objective function value of the dual of a linear program is 
piecewise-linear and concave as a function of the multiplier 
values (Bazarra and Shetty (1979)) this surface can be 


approximated by fitting a quadratic interpolating function 
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through these three observations. Using the Newton-Gregory 
forward equation to interpolate on equally spaced data, we 
calculate 


ne aad 


W(n) = W(0) + nW1 + .5n(n-1)W2 (2.10) 
ror 0 < n < 2, where 
Wl = W(1) - W(0) 
and 
W2 = W(2) - 2W(1) - W(0) 


Details may be found in Gerald and Wheatley (1984), for 


example. To compute the appropriate step length, we select 


nk = argmax (Wn) | 0 < . er2 ) 
and set sk = nk (v-v)/| |g} |. The resulting multiplier 
update is u,ktl = max (0,u,*+skaX) , and we solve for V(u,Kt1) 
to complete the approximate line search. 

Solving LR(u,*t1) provides a new subgradient gkt1 = 
(Axkt1l-p,)t which is added to the retained set. We are then 
ready to compute a new conjugate direction using (CD). 

Since we are approximating the subdifferential, ak will 
not always be an ascent direction. In that case, we simply 
take a small step in the direction ak generate gkt1 eke, 
improve our local approximation of the subdifferential, 


compute a new akt+l and continue. 
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When dk is near zero, we must insure that the sequence 
of u,'s generated is suitably close together to adequately 
approximate the subdifferential at ujX, Wolfe suggests this 
amounts to the check J }Ju,k-M - u,K-nt+1)| < mM for some 
M > 0. “Lf Sebere cond teten holds, u,k is accepted as near 
optimal; if not, all retained subgradients are dropped and 
the process is restarted from uz. 

It is possible to produce feasible capacity allocations 
from Lagrangean solutions, which may be used as starting 
points for resource-directive procedures. These allocations 
must be integer to preserve integer arithmetic in the 


network subproblems. Given a current value for x, the 


capacity allocation y may be computed by 


(CA) Pr[Xp34° (bi4/) Xp5) i) sem ( Dyk da) 
Pp 


Pr{Xp4 + () Xpj-b15)/1PI] if j # Jy 
p 


where Pr[*] indicates a projection onto the set of integers 
satisfying ) Yop} = Uy and yp; 2 0, integer for all p and 
| Cae he | oe Simple rounding is sufficient. 

Note that the conjugate subgradient approach is also 
applicable to the resource-direction problem, but we chose 
not to use the conjugate approach due to excessive storage 
requirements. While for the Lagrangean we must store m 
subgradient vectors, in the resource-directive problem we 


must store m*|P| vectors. 
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C. A RESOURCE DIRECTIVE PROCEDURE WITH LAGRANGEAN 
LOWER BOUNDING 


We now present a procedure combining the resource 
direction of Section A with the Lagrangean routine presented 
in Section B for solving the MCTP. Upper bounds) and 
feasible solutions are obtained via the resource direction, 
while lower bounds are obtained from the Lagrangean routine. 
Although we present a specific stepwise arrangement, the 
resource directive and Lagrangean routines are coupled only 
by the values of the _ bounds. They may be performed 
sequentially, iteratively, or in parallel, exchanging bound 
information when appropriate. 

Define the stopping criterion e > 0 as the allowable 
relative error between bounds, D > 0 as the acceptance 
criterion for a near-zero direction, and U > O as the 
allowable Euclidean separation of points used to approximate 
the current subdifferential. Let V and V be the current 
bounds and x be the incumbent solution. Also let K be the 
maximum allowable number of iterations. 

The Algorithm RDLB follows: 

Input: The network T = {I,J}, and joint capacity vector 
Pa, Gnome or seach product pp = 1,...,iP|, a cost 
vector Cpr and a supply/demand vector bop 


—- 


Output: Incumbent solution x, and incumbent value cx. 


step 0 (Initialize): Select e >0, D>0O, U>0O, m>l, 


ea imeeqersse: ck — ©, uo = 0, J, = 0 
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Solve LR(0), compute Jy = (3 |A;x°-b5 >, 10%, 
Evaluate V(0)) obtawningsx 
If J, # 0, stop with x° optimal in MCTP 
Else, V = V(0) 
set d° = g® = (Ax°-b,)* 
Set y° = CA(e?) 
Solve RS(y°) with xy" optimal, set Vv - v(y°), 
= xy" 


If (V -vV)/V < e, exit. 


step 1: Solve Lagrangean 
la (Line Search): Compute s = n(V-V)/||gX| |, solve 
W(n) @for nes, 2 


n ~ 4 


Set nk = argmax{ W(n), O<n < 2} 
sk = nk(v-v)/||9*1 | 
lb (Move to new point): 
Set u,Xtl = max(0,u,* + skaX) 
Solve LR(u,**1) 
rf weap) sam, w Sea 
1, (V-v)/v < e, exit with incumbent x 


Else, compute subgradient gk+1 where 


La 
oa 


| 0 otherwise 


K-b5 for jms de 
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set Jy = Jy 4 (3 |AQx=by5 > 0) 
set gktl = tgkt1 — pgk-mt1) 
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lc (Compute New Direction): 


Let aktl = argmin{||d||¢ stde conv (GKt1) ; 


V 


De seeaks— k+1, go to step 1. 


|A 


Dand ) |Juy¥M-u,k-n+1)) <u, 
n=0 
k = 1, go to step 2 


Tf | {ak | 


[A 


D and ) | Ju, k-N-y, k-n+1) | Su, 
n=0 
set d° = gktl, 





u,° = u,X, k = 0O, a g, 


qo eo steppes. 


step 2 (Resource Direction): 


2a. SsoOlvem@rReniy), fLorsthe current yx 


lp a (sg) ue = V(y), x = xy 
If (V-V)/V < e, exit with x 
Else let Jp = (j|A}x;*-b,5 = 0) 
29%. (Reallocate) 
For all jJ« Im, perform the procedure REALLOT 
to obtain von = Yp3* - s3*(u3zp4*-u35*) 
lets ips = Ypi* homed)! © Im, exXlt 
If k > K, exit 


Else, go to step 21. 


Two aspects of this algorithm represent additional 
computational burdens over a standard subgradient approach 
which must be justified. First, the direction-finding step 


requires solving a quadratic program. This is a small 
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program of m variables which may be solved efficiently as a 
linear complementarity problem using the  Kuhn-Tucker 
conditions (Bazaraa and Shetty, 1979). Since the purpose of 
computing conjugate directions is to reduce the zigzagging 
common to subgradient algorithms, the potential for fewer 
iterations justifies the effort. 

Second, the line search requires two additional 
Subproblem evaluations at each iteration. This means, in 
effect, that 3 normal subgradient steps can be taken for 
each conjugate step taken. It is not clear that this effort 
will always be justified. An alternate form of the 
conjugate subgradient algorithm which overcomes this problem 
first takes a series of subgradient steps, retaining the 
subgradients, and then periodically performs a conjugate 
subgradient step with quadratic approximation. This 
amortizes the cost of the search over several iterations. 
After each conjugate step, old subgradients are dropped and 
a new collection begins. Computational results presented in 
Chapter IV show that the conjugate directions method does 
generate a direction resulting in a large increase to the 
lower bound every few iterations, but 1s rather slow. 
However, because the entire procedure performs much worse 
than other algorithms tested, the alternate forms suggested 


above have not been tested. 
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tit eee Reh, ORRECTIVE DECOMPOSTTION ALGORITHMS 


In this chapter we develop the theory of dual 
decomposition and introduce a new decomposition algorithm 
which solves a penalized version of the original linear 
problem. This is a procedure which uses’7 restricted 
simplicial decomposition to construct a nonlinear master 
problem and conveys price information to the subproblems 
through the gradient of the objective function. With proper 
choice of penalty parameters, the subproblems quickly 
produce good lLagrangean lower bounds on the original 
problem, and improving primal solutions are produced by 
resource direction using the (infeasible) penalized master 
problem solutions. 

Both algorithms are applicable to general linear 


programs and their development here is accordingly general. 


—_ 


el 


A. THE DUAL DECOMPOSITION ALGORITHM 

The standard approach in describing the dual 
decomposition algorithm is to put forth the idee ian 
representing the feasible region of a subset of the 
constraints of an LP as a convex combination of its extreme 
points. This reduces the number of constraints, but 
introduces a large number of variables (the extreme points 
of the dualized constraints), so the problem becomes one of 
finding the extreme points which, in convex combination, 
describe the optimal solution to the original problen. 
Dantzig-Wolfe decomposition combines this extreme point 
representation with column generation, which is the process 
of generating extreme points as required. 

If we let X be a matrix whose columns are the extreme 


points of F, then any x in F may be expressed as 
=> 
Xx = Xw, l°-w=1, WoO , 


—_> 
where l*w = 1, w O enforce convex combinations of the 


IV 


extreme points. 
Substituting this form yields the Dantzig-Wolfe master 


problem, at the KtA iteration 


min cxkwk (duals) (2).@)} 
st (Ax) wk ery Wy (352 
Tw = i Uo (33) 

wk > 0 (3.4) 
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where XX is a matrix of extreme points collected so far, and 
(3.3) and (3.4) are the convexity constraints. 

Each time the master problem 1s solved, we check to see 
if it is an optimal solution by computing the reduced costs 
for the non-basic extreme points of F. It is not necessary 
to check them all since the most negative reduced cost is 


found by solving the problem 


Meme C-UzA)X — Usp o>) 


st me F a 


which produces an extreme point of F. This is the Dantzig- 
Wolfe subproblem, which for MCTP is a set of single-product 
network problems which are easy to solve. 

If xk solves the subproblem at the kt iteration and 
(c-u,A) xK-u, < 0, then the reduced cost associated with x 
is favorable, so xK is added to the collection of extreme 
points and we return to the master problem. iit (c-u,A) xk- 
u°- > 0, then the reduced cost for xk is unfavorable. Since, 
due to the minimization, it has the minimum reduced cost, 
then all non-basic extreme points of F have unfavorable 
reduced costs, and the process terminates with the solution 
to the previous master problem optimal in MCTP. 

A second approach may be taken to the dual decomposition 
algorithm which considers the decomposition as a cutting 


plane process (Kelley, 1960), resulting in a master problem 


oye 


which is dual to the standard master problem (e.g., Graves 
and Van Roy, 1979). 


The problem to be solved is 


(P) min cx (dual variables) 
st Ax < by (uj) 
ix =" oo (uz) 
x. > 0 


whose dual is 


(D) max ub Dy, U5b5 
U7 tetoNe= Cc 


Ug 2S “Upgeer mee 


The feasible regions associated with the various constraints 


are 
Fpy = (4p) 
Foo = {x|Naw = b>} 
Bp = Fpqmeand Re5 and «(xt 0) 
Fp = (U|Ugeeet UgN < c} 
Fp = Ppa and. (ajuqge< ©}. 


The respective values of the primal and dual problems are 
written V(P) = cx and V(D) = ub. 
Let f(u,x) = cx - u,(Ax-b,), which we recognize as the 


Lagrangean of (P) with respect to the constraint set A. 


Lemma 3.1; If Xc¢« Ppp, xX 2 O, U ER, Up = OF 
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Proof: Write the identity: 
Dope ee eDaeeec Dae > U5 (NX=b5) - (@=u,A-u5N)x. 
Rearrange and substitute: 
Me) Sites U5 (Nx—-b5) = (c=-u,A-u5N)x. 
But the two right-most terms are < 0, giving the 


desired result. QED. 


This suggests that by fixing uj, theme £ollowang 


subproblem results: 


(SUB(u}z) ) Mite = (Uy ,o) — U,b, + (c-uj,A)x 
st Nx = D> (Onley) 
xX > 0 


This is the Lagrangean relaxation of (P) with respect to 
A, with u, fixed. If (SUB(uj)) is infeasible, so is (P); 
otherwise we generate u,; using a master problem in a 


convergent algorithm. 


Let uk = (u,*,u5*) be a composite dual solution at step 
( 


Keeot the algorithm. The following lemmas describe its 

properties: 
Lemma 3.2: Let xk and u5* be respective optimal primal 
and dual solutions of (SUB(u})). Then uXb = 


f(u,*,x*) = V(SUB(u,)), and uX « Fp. 


oie 


Proof: Again, use the identity 
ub =Vex=u) (Ax=b7) =u5 x=) > ) = (Cane e 
So ukb = f (uz, *, xk) -u5* (NxK-b>) - (c-uzKa-us kn) xk. 
Since u5k,xk are optimal in (SUB(uz)), by 
complementary slackness 
u5* (NxK-pb,) = (c-u,Ka-u5kn) xk = 
(SUB(u,*) optimal implies 
(c-u5 KN) -u,Ka > 0, and u,k < 0, yielding uk ¢ Fp: 


QED 


According to Lemma 3.2, for any u,k < 0, a feasible 


solution uk = (u,*,u5*) € Fp can be constructed with value 
V(SUB(u})). Let u be the best solution to (D) currently 
known. The following lemma establishes the necessary 


conditions for improving the incumbent solution, u. 


Lemma 3.3: Let K = {k| xk é Fp>,xX* aero}. A necessary 
GOnea) Clon aor V (SUB(ux, ) to yield a dual 


solution value better than the incumbent value 


ub is that f£(u,xk) > ub+e, k € K, for some 
S i> 401. 
Proof: xK « Fo is feasible in (SUB(u,)) for any uj, so 
V(SUB(u,)) < f(u,x*). For V(SUB(u;)) > ub, uz must 


satisfy f (u,,xX) = ub+e for k ¢« K and e > 0. Qa 


The existence of a convergent algorithm for finding u,* 


is established as follows. 
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Lemma 3.4: The sequence (SUB(Uu,)) with arguments (uz*) is 


Proof: 


finite if {X « Fp9,x > 0} 1s bounded, V(D) is 
finite, and at any step k, u,* satisfies 
f (u;,*,x) > ubte for 1 = 1,...,(k-1) for some 


e > Ox 


SUB (u;X) yields basic solutions, xk, which are 
finite in number. The lemma follows if any 
repeated basic solution yields termination of the 


sequence. By lemma 2, ukb = f£(u,*, xX) and uX « IP aye 


If xK = xl for some 1 < k, then uXp = f (uz, *, xX) 


f(u;,*,x1) > ubte since u,* satisfies £(u,*,x ) 


IV 


‘ubte for 1l1= 1... (kK-1). Thus u,K 1s a new 
incumbent, and with e > O and V(D) finite, there 


may only be a finite number of such updates. QED 


This criterion leads naturally to the master problem: 


f(u,,x! ) = cx! -u,(Axl-b,) > ubte, 1 = 1,...,k 
( 


uz, < 0 


Notice that the objective function is unspecified: any 


Objective will suffice. Experience shows that the most 


recent cut works well in practice. That is, 


(MP (x) ) 


max £(u,,xX) 


st f(u,,x1) = cxl -u, (Axt ee 22 ubte, 


>) 


may be unbounded, 


MP (xX) 
that f(u,xt) > ubte, 
V(MP(x)) < ubt+te, the 
solution. 

According to the 
solution to (P) can 
frnitce. 

Lemmams3 . 2 err 


dual solution to MP(x) 


V(MP(x)) 1s finite, 


in which case any wu, < O such 


l= 1,...,(K-1) Wile cueeaee. When 


process terminates with an e-optimal 


following lemma, a feasible primal 
be recovered whenever V(MP(x)) is 
> 0, l= i1,...,(K-1) be the optima 


at iteration k. its 


a primal feasible solution 


to (P) 1s 


xK = (xk + J) 


k-1 
wyxt}/[1 + ) 


k-1 
W1] 


Proof: 


MP(x) 1S restated in canonical form as 


-u, (AxK-b,) + cxk 


max 
-u, (b,-Ax1) ce cxl ~[{ub+te], l= 1,...,(ie (WwW) 
By duality, the optimal dual solution, w, satisfies 
k-1] 
) = (b,-Ax1) wy < b,-Axk, yielding 
l=1 
k=-1 “— 
are + ) smeeby < (et w1]b 
141 1 46] ) J. 
Therefore, xX « Fpl - Also x* if a cOnwex 
combination of xt « Fp, so xK « Fp: QED 
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— —, 


Lemma 3.6: If V(MP(x)) < ubte, then x and u are e-optimal 


Proof: 


primal and dual solutions to (P). 


V(MP(x)) = £(u,%*1,xk) = cxk-u,ktl(axk-b,) (primal) 


cxk + ) texl-(ubte) Jw] (dual) 


roxK+(ub+te)][1 + y w1]+(ub+te) J 
Rearranging yields 

cxK-(ub+te) = [£(uzk+1, xk) -(ub+ espa il + ) wr] and 
cxk < £(u,Kt1,xK) = v(MP(x)). If V(MP(x)) < Ubte, 
then cxk < ubt+te. If u* is the optimal solution of 


(D), since xK ¢ fo, and 


ub < u*b, u*b < cxk < ubte < u*b+e. QED 


Finally, we show that once the master problem becomes 


bounded, 


Lemma 


Proof: 


it remains that way. 


3.7: The value V(MP(x)) remains bounded once 


F(MP(x)) becomes bounded. 


V(MP(x)) is finite if F(MP(x)) is bounded. Once 
F(MP(x)) becomes bounded, subsequent iterations add 
constraints and make existing constraints tighter 
by updating ub, further restricting the problen. 


QED 


The dual decomposition algorithm, DDC, based on the 


preceding theory follows. 


Se, 


Algorithm ve. 

step 0: Specify uy" < ©, © > 0,7 k — ieee 

step 1: Solve (SUB(u,*)) for xk, uk (i.e., £(u,*,x*y 3S 
ubte, xX = Fp> ,x* 20) 
If infeasible, stop with (P) infeasible 
TE ukp = (u,%,ugk)b > Ube update incumbent 


Solute ron. 


step 2: Solve (MP(x)) for u,ktl 
< ubte, declare xK u e-optimal for (P), stop 
= ub+e and finite, xK « Fp2, k = k+l, go to 
step 1 
Li  VaGiire (xp) +o, use any u,ktl & 208 f(u,Kti, xk) > ubte, 
k = K+1l, go to step 1 


is infeasible, STOP 


Since the master problem yields feasible primal 


solutions, it provides upper bounds, 


V(MP(x)) > cxK > u*b 


Retaining all cuts may become too expensive in time or 
space in solving MP(x). It 1s possible to conduct the 
algorithm as a heuristic by retaining only a fixed number of 
eubs . This may degrade the quality of the solution 
obtainable at any iteration but does not prohgoit 
convergence. A discussion of cut-dropping strategy iS 


provided in the chapter on computational experience. 
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The value of e need not be fixed. A large value may be 
used initially which may be reduced as inconsisten-cies are 
encountered in MP(x). Parametric analysis may be used to 
find a maximum e, or e may be reduced in a fixed algorithmic 
sequence of relaxations of cut aspirations. 

As a practical matter, the sequence of master problems 
behaves much like a first-order descent method in convex 
nonlinear programming. Each Clits is a tangential 
approximation to the objective function as can be seen in 
Figure 3.1. Just as in nonlinear programming, we can expect 


oscillation as the solution sequence progresses. 





Figure 3.1 Tangential Approximation to Lagrangean 
Dual Function 
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In order to deal with such oscillation, a local 
neighborhood (or trust region) may be specified for uj. 
Initially, this neighborhood is relaxed to encompass values 
of u, sure to contain the optimum (1.e., viewing u, as a 
penalty per unit of violation of joint capacity constraints, 
we wish to assure a feasible completion). Subsequently, the 
trust region can be restricted when oscillation is apparent 
(e.g., whenever V(MP(x)) 1S non-monotonic improving). 

As an additional stabilizing influence, we introduce 
decomposition goals for the master problem variables. These 
goals may be violated at a small linear penalty cost. 

All the essentials for convergence are preserved as long 
as the cuts are satisfied hierarchically before the goals. 
In the same vein, the trust region and cut dropping 
heuristics do not present a serious impediment in practice. 
Brown, Graves, and Honczarenko (1983) developed similar 
mechanisms for primal goal decomposition of mixed integer 
models. 

DDC cuts can be restricted to Dantzig-Wolfe cuts if we 


force a maximal solution: 
ppc cuts cx - u,(Axl-b,) > (ubte), D ce 1 pow 


DDW cuts max Uo 


st cxl = u, (Ax}-b1) 2 Uo Be uzb, it m1,...,8 
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In this sense the qual of the Dantzig-Wolfe master 
problem is equivalent to the DDW master problem. Taking the 


dual of (3.1)-(3.4) yields 


max Uj zby + Uo 
st u,(AX*) + ug < (cx)! 


uz < 0 


Rearranging and writing each constraint individually, the 
result 15 Up < cxl - uy (Axt) for 1 = 1,k. Adding u,b, to 


each side results in 
UG teu by < cxl - uy (Axt-b,) BOE] = lpn. -,K 


Before updating DDC to include all the above innovations, we 


introduce the following notation: 


X is the primal incumbent, 


xK is the matrix of extreme points generated up to 


iteration k, 
—— f 
V is the upper bound, 
e,ef > O are initial and final convergence tolerances, 


R,R¢ > O are initial and final trust regions, and 


Nhola Nf 2 1, integer are number of cuts held, and 
total number of cuts allowed, and 


We 1s exponential moving average of cut weights. 


The updated algorithm DDCl1 becomes: 
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step 0: (initialize) 


Specify e > 0, ef > 0, R= 0, Ree serena 


IV 
~ 


Set ub = - 
step 1: (solve aggregated problem) 


Solve min ¢c)X, st Nyx, = bs, 0 < x] < by 


where b, = ) bop 
p 


HOmseinicle Uy <0 


Set initial decomposition goals. 


step 2: Solve (SUB (u,X) ) for xk, u5* 
Generate cut 


If uXb > ub, update incumbent solution. 


step 3: (optimal capacity allocation) 
Set yx — CA(xk) , solve RS (yx) with age optimal 


Ate v(yX) < V, update V, x 


Generate cut. 


step 4: Solve (MP(x)) for u,kt1 
Update decomposition goals 


If V(MP(x)) < ub+e, set @ = Max (e/2 ,er) 


It © > Ge, repeat step 4 


If V(MP(x)) not monotonic, set R = max(R/2,R¢) 
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step 5: If V(MP(x)) > ubte and finite, xK . F 
| so, use u,ktl < 0, f(u,k*t1,x2) > ubte 
is infeasible, STOP 
Set cut weights wkt1 = swk + .2wk 
If solving relaxed master problem, recover primal 
solution (i.e., xpi = xkwk) 


aleaty exnn < y update V, X. 


step 6: (termination tests) k = k+l 


Ine a Sos eb ate| ub < V-e go to step 2. 
Cut generation is performed as follows: 


Generate cut 

n = n+l 

if n > Npoigq then 
locate slack cut with minimum weight and replace 
it; if no cut is slack, locate taut cut with 
minimum weight and replace at, also relax upper 
bound from recoverable primal solutions. 

Generate cut of desired class (e.g., Dantzig-Wolfe, 


PDC. scene 
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B. A PENALTY ALGORITHM FOR LINEAR PROGRAMS USING 
RESTRICTED SIMPLICIAL DECOMPOSITION 


Penalty functions have not been seriously considered as 
vehicles for large-scale mathematical programming in the 
past because they introduce nonlinearity (Geoffrion, 1970). 
However, the recent advent of interior point or logarithmic 
barrier function methods has shown that - non-linear 
approaches to linear programming are at least viable and 
offer attractive convergence rates and complexity properties 
compared with simplex-based methods (Karmarkar, 1984; Gill 
et al. 1986). In this section we develop the theory and 
present an algorithm based on penalty function concepts 
which 1s well-suited to large-scale optimization 
applications. The method has strong parallels to Dantzig- 
Wolfe decomposition, using restricted Simplicial 
decomposition to produce a nonlinear master problem and 
linear subproblems. The process incorporates Lagrangean 


lower bounds in a natural way, but produces infeasible 


solutions in the master problem. We show how to use 
capacity allocation/restrictions on these infeasible 
solutions to produce improving upper bounds. The result is 


a convergent algorithm which displays a superior convergence 
ra@se in preaietice. To set the stage, we begin with a 
discussion of penalty function theory. 

Penalty algorithms approximate constrained optimization 
problems by unconstrained (or partially constrained) 


problems. This is done by placing the constraints into the 
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objective function with a penalty parameter which exacts a 
large price for any violation of the constraints. Phys 
relaxed approximation to the original problem becomes more 
accurate as the penalty parameter is increased. MThus, the 
penalty algorithm generates a sequence of infeasible points 
which converges in the limit to an optimal solution of the 
original problem. 

From the general form found in (1.6), we introduce the 


specific penalty form of a linear progran, 


(PP) min q(h,x) = cx + P(h,x) 

st x e¢«F 
where P(h,x) = 21 1a(x) 11E, lt-lle indicates the th norm, 
Q(x) = (Ax-b,)*, (Ax-b,)* = max(0,Ax-b,), scalar h > 0, 
eat < 2, and F = {x|Nx = bo,0 Xp < bi}. This is a 


common form found in any nonlinear programming text (see, 
for instance, Bazaraa and Shetty, 1979, or lLuenberger, 
1984). Recalling that Jy, is the set of currently violated 


constraints, the objective function may also be written as 


min q(h,x) = cx + J 2 a3 (xt ; (3.2) 


JeJy 


Since the penalty terms are convex for h > 0O, the 
objective function remains convex, and for t #1, the 
objective function is differentiable everywhere. Convexity 


is proved for the quadratic penalty function in Appendix 
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A: convexity arguments for other forms may be found in 
Bertsekas (1982b) and Luenberger (1984), for example. 

Due to the convexity of q(h,x), we may employ the 
standard mathematical result that x is optimal in q(h,x) if 


and only if first-order stationary conditions are met: 
vq(h, X)i@ee) => «(0 (38) 


for all x « F. For Vaq(h,x)(x-x) < 0, (x-x) represents an 
Mmproving dalrecemem: 

The following penalty function characteristics, stated 
in terms of PP, are taken from Luenberger (1984). Let (nk 
and (xk) be sequences of penalty parameters and associated 


optimal solutions, with hk+1 5 nk, and h°o > 0. Then 


q(hk,xK) < q(hkt1,,kt1) 
P(xk) > p(xkt1) 
cxk < oxktl 


and 
cx* > q(hk,#®) > xk , 


where x* is optimal in MCTP. Thus, solving the penalty 
problem for any h = hk > 0 produces a lower bound on MCTP 


and 


lim q (hk, xk) = ex*™ with xk « x 
ko 


68 


This large-scale nonlinear version of the original 
linear problem is useless without an effective means of 
solution. We base our solution algorithm on the promising 
method of restricted simplicial decomposition. 

Restricted simplicial decomposition (RSD) was developed 
by Hearn et al. (1984) to solve large-scale pseudo-convex 
nonlinear optimization problems. The method decomposes the 
Original problem into a nonlinear master problem and a set 
of linear subproblems (assuming linear constraints). At 
each iteration, the subproblems generate a new extreme point 
of the feasible region for the master problem, and the 
master problem minimizes the original objective function 
over a Simplex of retained extreme points. The master 
problem in turn provides new cost information to the 
subproblems via yf (x) where f is the objective function of 
the master problem and x is the current solution. The 
process terminates when no favorable extreme points remain. 
It is termed "restricted" since only a fixed number r of 
extreme points are retained at each iteration. 

When r = 1, RSD specializes to the algorithm of Frank 
and Wolfe, 1956. In this case the master problem becomes a 
minimization on the line joining the current point x and the 
extreme point just generated. Many researchers have noted 
the slow convergence of the Frank-Wolfe algorithm (Ali, et 
al., 1978; Meyer, 1974; Wolfe, 1970); a simple example taken 


from Wolfe (1970) makes this evident. Suppose we are trying 


69 


to find the point of a polytope which is closest to te 
origin: in Figure 3.1, this is the midpointyof thesbacemen 
the triangle. At each iteration, linearizing the objective 
function and solving the resulting LP leads to vertex B or C 
of the triangle. Thus, as the algorithm progresses, the 
line search proceeds along a direction nearly orthogonal to 
the gradient VE (xX) causing zigzagging. However, simply by 
retaining two extreme points and optimizing over the simplex 
formed by the extreme points and the current iterate, the 
problem in Figure 3.2 is solved optimally on the second 
iteration. Indeed, Hearn's computational results show 
Significant improvement as r is increased from 1 (Hearn, et 


al., 19eane 


Figure 3.2 Frank-Wolfe Zigzagging Example 
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Using this method on PP decomposes the problem into a 
master problem, which minimizes q(h,x) over the set a 
retained extreme points, and a subproblem consisting of a 
set of decoupled network flow problems with modified prices. 
The master problem is simple to solve since it has only a 
convexity constraint like (333) and nonnegativity 
constraints on a number of variables only equal to the 
number of retained extreme points. Furthermore, Hearn shows 
that the method is convergent even if the master problem is 
only solved approximately. 

The decomposition of the penalized MCTP is formed in the 
following manner. Let X be a matrix whose columns are 
retained extreme points of F, and r be the number of points 


retained. Then the master problem becomes 


(MP) min q (hk, xkw) = cxkw + P(hk, xkw) 
ae 
st: lew = 1 
w > 0 


This resembles the Dantzig-Wolfe master problem, except that 
the penalized joint capacity constraints now appear in the 
objective function. Let m = ) ean = xKkw be the optimal 
Eoulelon to MP at WPion x, and Jy be the set of 


constraints violated at xk. The corresponding subproblem is 


(SP) min Saat’, S52 | (3.4) 


st x ¢« F (3.5) 


7a 


where 
yachk, xk) = ctyp(nk,xk) = ctnkg(xk)VQ¢xk) = ct+h*(axk-b) ta 


Comparing this form to the objective of the Dantzig-Wolfe 
subproblem (3.5) reveals that -h*(axk-b,)*+ is an estimate 
for the dual multipliers, u,;. Thus, we make a new estimate 
of the optimal multipliers based on the violations in the 
current master problem. To avoid notational confusion, we 
let u,k = nk (axk-p1l) + in the rest of this section. Now, if 


xK solves SP and 
er), 329) (Esco) een | 
then xk solves PP for h = hk; otherwise (xk-xK) is a 
favorable direction, so we add xK to the retained extreme 
points and return to the master problem. 
If xX is optimal in PP for h = hK and x* is optimal in 


MCTP, then 


q (hk, xk) < cx” , providing a lower bound on MCTP. 


as 


However, since xK is probably not an extreme point of F, we 
must identify the facet of F containing and solve MP 
exactly to establish this bound. We use other information 
to provide intermediate bounds. 


Recall that the Lagrangean relaxation of the linear 


program, with u, fixed, may be written 


ye? 


LR(uj) ine (Ge > es) x Se ua) 
st xe 
Using the set Jy, associated with any solution to the master 


problem, we let 


| ore ase J g Jy 329 ) 


415 
| fl (U 215) 4) Bee SS ny 


and rewrite SP as 


w 


min (c + u,A)x 


Se eee 
Thus, we observe that 


V(ujz) = Vq(hX,xk)xk - ub, , (3.10) 


* 


so at each iteration we compute u,b, as we compute new 
subproblem costs and adjust the global lower bound whenever 
(3.10) exceeds the current lower bound. 

Since the penalized problem is convex, it is also 


possible to linearize the objective function at xK to 


establish lower bounds via 
q(hK, xk) + Vq(hk,xK) (x-xK) < cx* 


However, the following lemma establishes that the 
linearization is always dominated by the value of the 


Lagrangean relaxation with u,* = h((Axk-b,)+)t71. For ease 
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of presentation here, we use the non-standard vector form 


[ (AxK-b,) + 


ee meaning [max (0,A5xK-b,4) J ° for each element in 


the vector, and we drop k from xk anda uk, 


Lemma 3.8: Let uy = h{ (Ax-b,)*+)t71 at point x « F. @iem 


Proof: 


The 


q(h,x) + Va(h,x) (x*-x) < ¥V(uz) < cx", whem 


xk = argmin {vq(h,x*)x SG x= < oh) 


Oro Ue eee (Ax-b,)tyto1, vath,x) = (ctugage Now 
q(h,x) + vy q(h,x) (xK-x) = cx + P(h,x) + cxk 4+ 
VP(h,x) xk = cx = P(h,x) x = cxk 4 P(h,x) aa 


VP(h,x) (xX=x) = (c+u,A)x* ar P(h,x) = VP(h,x) x Since 
VP(h,x) = h[{(Ax-b,)t}t-1a = uA. By definition, 
v(uly = (c+u7A) xk = ub, <“ex*, so our reswituame 
proved if yP(h,x)x - P(h,x) > U,b,. Now VP(h,x)x 


P(h,x) = u,Ax - =h({ (Ax-b) +] t71) ' (ax-b) + = u)Ax- 


U, (Ax-b})t. But, since 15 = 0 Whenever A;x-by5 


lA ctl 


O, we see that 
u Ax =F =, (Ax-b,)* = n (=) dy Ax ae =04by 


Since uy A4x > uy 4b- fer all J, “then (£2 *) uyax + 


u,b, i u,b}. OED 


(relaxed) penalty form of MCTP produces a sequence 


of infeasible x's which is only guaranteed to converge to x” 


in the limit. Therefore, the penalty algorithm does not 


naturally produce intermediate feasible solutions or upper 


bounds. 


In general, this 1S unacceptable, so we rely on the 
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improving nature of the sequence of x's, uSing projection 
and restriction to generate intermediate feasible solutions 
and upper bounds for MCTP. 

Suppose h is fixed and x solves min q(h,x) POGeexec iF. 
Paen, Lf xk solves the master problem at iteration k, either 


xk = x or a favorable extreme point is added to the master 


problem with 


q(h,xXt1) =< q(h,xk) 


If xX = x, increasing h to h' > h and resolving MP over the 


same extreme points yields 


a(n, xX) << q(ht,xkt2) 

By sampling the sequence periodically as xk , x* and forming 
a capacity allocation according to (2.11), yx = ca (xk) (with 
scaling to allow non-integer solutions), we will form 
allocations in RS(y) which improve as x improves, thus 
obtaining improving feasible solutions and, therefore, upper 
bounds. These resource allocations may be performed 
whenever desired; we choose to do an allocation every pth 
iteration where r is the number of retained extreme points. 

The following lemma shows that the allocations must 


eventually converge to an optimal allocation. 


Lemma 3.9. Let (xk} be a sequence solving MP as (nk) CO, 


yX = CA(xK), and may solve RS(y*). Assuming 


7S 


the solutions are scaled to provide integer 


allocations, as pee, ae , v(yX) + \* and 
as + x™, 
Proof: lim xk = x* for the penalty problem, where cx* is 


optimal in MCTP (Luenberger, 1984, Chapter 12). 
Since yx = CA(xk), as h-o, y* = lim yk 4 Ch Gee 
Since x* is feasible, Ypj = Xpj- for all p aneigye 
Thus ee - CXy™ = ese But Xy is feasible in 


MCTP by construction, so CXy™ > oars Therefore 


CXyx = cx”, and xy solves MCTP. QED 


The algorithm proceeds by iteratively solving the 
subproblems and master problem, and periodically solving the 
restriction to generate new feasible solutions. The value 
of h is increased whenever 

1) Va(h, xk) (xk-xk) > 0, 

2) RS(y) will be solved in the current iteration, or 

3) min _—' <email 
The algorithm terminates when a feasible solution is 


generated in the master problem, since it must be optimal, 


or when 
(V=/e ae 
for some e > O. 


The algorithm uses the following additional notation: 
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matrix of retained extreme points at iteration 


xk = current master problem solution 

XM = previous MP solution considered in current MP 

xk = a matrix formed by augmenting xK with Xm, Written 
xk U XM 

conv(X) = convex hull of X 

Jy = set of capacities violated by the ktA solution to 
MP 

a > 1.0, a scalar multiplier for increasing h 

x = current incumbent for MCTP 

e = power of the penalty function, 1 < t < 2 

Ie > 1, integer; maximum number of retained extreme 
points 

2 > OO, a stopping parameter 

CA(X) = a capacity allocation based on x using equation 

x G2), a} 

V,¥ = current upper and lower bounds on MCTP. 


The Algorithm RSD(P): 

iiputs The network T = {IgJ}, and joint capacity vector, b, 
and, for products p = 1,...,|P|, a cost vector, Cy, 
and supply/demand vector, b»5 


Output: Incumbent solution x, and incumbent value cx. 


Peer (Initialize); Select h° > 0, e> 0, a>il, r>l, 
set k = 0 
Solve LR(0), with r optimal; set V = V(0) 
Form Jy° = (51 A5x° - bi43 > 0) 
If Jy° = Y, stop with x0 optimal 
Eise, set y = CA (x°) 


Solve RS(y), with Xy° optimal; set V = V(y)- 


va 


If (V - V)/V < -eoStep with incunberaee— xy? 


Else, set X° = 6, 2x =ex® 


step 1 (Solve Subproblem) 


Let xk = argmin{v q(hk, xk) x|x ef} 
where 


If qi aie =) > 0, xk solves MCTPP for h = hk 


set V = q(hX,xX) 


m | 


LG (V = v)/V < e, exit with incumbent 
Else hk = a-hX 


Go to step 2. 


Else compute V(u7) vq (nk, xk) xk = u,b7 
Ese v(uq) 2 Vio = Vie 
If (Vv = v)/V < e, exit with incumbent x 
Else 
i) rf [xk] < rx, xkt+l = xk u yk 


11) > i \xk| = r, drop the column of xk which had 
the smallest wy, in convex combination forming 
xk and replace it with xk to form X 


xy = xk 


+1 _ + 
Set xKtl = xkKtlu xy 
Step 2 (Solve Master Problem) 


nn 
Fan 


xkt+1 = argmin (q(nX,x), x «€ conv (XX) } 


7 


Discard all columns of xk+1 with wy, = 0 
Form the set JyX+1l = (3|A;xk+1 - bj; > 0) 
If k is an integer multiple of r, do a resource 
allocation: 
Set y = CA(xkt1) 
Solve RS(y), with "a optimal 
Tet V(y) Es set V = v(y), x = aT 
igs (V - Vv) /V < e, exit with imeumbents 
Else k = k+l 


Go to step 1. 


One possible difficulty with the penalty form of the 
objective function is that only the prices associated with 
the currently violated set of joint capacity constraints are 
adjusted at each iteration. There is no intermediate 
adjustment of multiplier values for constraints previously 
but not currently violated. Furthermore, convergence to 
optimal multiplier values for MCTP is guaranteed only in the 
hime, that isjacas h > ®. These potential problems are 
overcome by the augmented Lagrangean form of MCTP. As the 
name suggests, the objective function is formed by adding a 
penalty term to the standard Lagrangean dual form (Hestenes, 
BS 7:5). Since the theory of this augmentation is developed 
in terms of equality constraints, we express the joint 


Ccapacitation constraints of MCTP as 


(A5x-b,4) + m4? =Q for jeJd 


ho 


The augmented Lagrangean problem is then 
min L(x,U7,h) 
Men 
=cx + J [0,5 ((A3y7by5) 4m52)+hh|Ayx-by44+m42/2}, x « F 
a5 a a) ete) To) 2 J 1°") d 


Witch Uq eOe 


Bertsekas (1982b) demonstrates that an equivalent 


objective not requiring the variables m is 


L(x,u ,,h) = cx + J. € {max (0,U,5+h(A4x-by5)) }2-uy52) 
Jed 


This amounts to a multiplier update 
u, Kt = max{0,uk th (A4x-b14) } fer allie.) oa 


Using vector notation, the problem is stated as 


Mime (x, U,,h) = cx + Pix, usm 
st x €  F 
n l An . 5 ~ 5 
where P(x,uz,h) = sp(lluz'l!* - |Jug/14) 
and Uy =a) h (Ax-b,) ]* 


By analogy to the penalty version of the algorithm it is 


”~ Aw 


easy to see that the subproblem, min VL(x,u,,h)x for x e F 

may be restated as man (c+u 7A) x; st. xX ¢€ F, and a lower Doeuna@ 

on MCTP is derived by computing min TE ey nye 
Bertsekas points out certain advantages to solving this 


augmented Lagrangean form of the problem. First, the method 


will converge to an optimal solution for a finite value 
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ent Ig Second, the rate of convergence of this augmented 
form is superior to the penalty form, which depends on the 
rate of increase of h, thus substituting convergence rate 
for ill-conditioning concerns in the master problem. 

We conclude this section by demonstrating that RSD(P) is 
a convergent algorithm. Hearn, Lawphongpanich and Ventura 
(1985) present a proof that RSD either terminates in a 
finite number of iterations or generates a sequence bxky 
for which any subsequential limit is a solution. For the 
penalized version of the MCTP the proof may be simplified, 
relying on the simple assumption of a bounded feasible 
region (1i.e., 0 < Xp < b, < ©. 

In preparation for the main result, we first show that 
solving the master problem for fixed h produces a sequence 


of improving solutions (Hearn, et al, 1985). 


Lemma 3.10: Let xk solve the restricted master problem at 
iteratwoen k, with h = h. If xk is not 


optimal in PP, then q(h »xktly < q(h Uxky, 


Proof: At iteration k, xk is the current iterate and x 


solves the subproblem with vq(h, xX) (xk- xk) <a oe 


Thus xX « xKtl ana xktl = xk+lu yk, Now let 
xk+1 = argmin{q(h,x) s.t. x « conv (xkt1) }. Then, 
q(h,xXt+1) < g(h, xX) since xk « xkt1l, Now, LIE 


q(h,xktl) = q(h, xX), then xk also minimizes the 
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master problem at iteration k+l and vq(h, x) (x-x) 
> O for all x « xkt1, But since xk « xk+1> this 
contradicts the assumption that (xK-xK) is an 


improving direction. QED 
The main convergence result is stated as follows: 


Lemma 3.11: Let F be a bounded, non-empty set of feasible 


Single commodity network flows. The RSD(P) 


algorithm either solves PP for h=h in a 


finite number of iterations or converges to 


an optimal solution x as the limit of an 
infinite sequence. Furthermore, as h is 
k 


increased to ~, q(h,x) > cx™ and (x) > 3 


where x* is an optimal solution to MCTP. 


Emoot < Wet x “solve min q(h,x) st x c¢« F. Since F is non- 


empty and bounded, choose any starting point 


Moose HK, Giving q(h,x) < Gighwee) < ~ Then at any 


iteration either x* = x or 


—_—- A 


q(h,x) < q(h,xk+1) < q(h,xK) < q(h,x°) 
Letting aX = q(h, xk) - q(h,xk*1), then ) a* = 
= 0 


g(hee?) - a(h,x), so = aK = 0. Since only x 


varies, lim |xK-ykt+1 | = 0 and, since ee q(h, x) = 
k~+>0©o ; -> co 


qth =) , aren (xk } > Oe. 


Since x « F, we set h to h > h and continue 


the algorithm. By Lemmas 1 and 2, Section 12a] .:¢ 
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nw MN 


Luenberger (1984), q(h,x) Seacrn 7 xX )S cx”. Finally 


by Section 12.1, Theorem 1 of Luenberger (1984) 


letting h**, we have lim q(h,x) = cx™ and 


CO 2x. OED 


It is not necessary to solve q(h,x) to optimality for 
each value of h to obtain convergence since all extreme 


points of the current simplex are elements of F. Thus, 


#N — 


whenever it is desirable, we may set a new h > h and resolve 


“~ 


the master problem with new objective function min q(h,x) 


“a 


over the same simplex to obtain a new xk, The subproblem 
costs are then based on Vq(h,xk) . Appropriate opportunities 
to increase h have already been discussed. 


The preceding proof demonstrates that the algorithm is 


theoretically convergent without resource directive capacity 


allocations. Indeed, Bertsekas has shown that, for proper 
choice of sequence oly ae penalty algorithms may be 
quadratically convergent (1982b). However, in practice, 


awaiting convergence for large problems may be impractical. 
Therefore, we use the resource allocation steps to help 
obtain near-optimality quickly. As with the resource- 
directive approach discussed in Chapter II, we always employ 


Simple reallocation to obtain the best available solutions. 
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IV. COMPUTATIONAL EXPERIENCE 


The algorithms presented in Chapters II and III have 
been coded in FORTRAN and tested on various versions of a 
large scale MCTP using an IBM 3033AP under both VM and MVS 
operating systems. For comparison, some four and ten 
commodity problems have been solved using the X-system of 
Brown and Graves (1974), which exploits the generalized 
upper bound structure of the complicating constraints in the 
MCTP and employs advanced starting solutions of the network 
constraints. These control problems were solved using an 
IBM 3081K, which iS approximately twice as fast as the 
3033AP for the X-system. 


Throughout this chapter the following acronyms are used: 


DDC = dual decomposition, 

RDLB = resource direction with Lagrangean lower bounds, 
RSD = restricted simplicial decomposition, 

RSD(A) = RSD of the augmented Lagrangean form, and 


RSD(P) 


RSD of the penalty form. 

The particular test problems used are described in Sec- 
tion A, issues concerning individual procedures are studied 
in Section B, and comparisons among algorithms are presented 
in Section C. Whenever the word "gap" is used to describe 
the quality of a solution, it means (vay) Jv, where % ee 


and v* are the upper and lower bounds, and optimal solution. 
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A. TEST PROBLEMS 

We demonstrate our methods on a transshipment model for 
delivery over time of military products from production and 
storage locations to overseas locations to support theatre 
operations. The model covers five physical echelons, 
including production plants, storage depots, ports of 
embarkation, ports of debarkation, and geographic field 
locations. Road, rail, sea, and air transportation are 
modelled, and product demands are time-phased. Capacitation 
occurs primarily on sea and air links, and as throughput 
capacities on transfer points, requiring replication of some 
echelons. 

The objective of the model is to minimize deviation from 
on-time deliveries. This 1S accomplished through a 
specified set of backlogging arcs with graduated penalties 
and a system of penalized "artificial" arcs which direct 
flow around the network to satisfy "undeliverable" demands 
and to equilibrate supply and demand. The products all use 
a common unit of flow and incur a common cost on each arc. 

The network is abstracted in Figure 4.1--a more detailed 
description of the model may be found in Staniec (1984). 
Computational tests use an underlying network of 3,300 nodes 
and 10,400 arcs, of which 1,071 are subject to non-redundant 
joint capacity constraints. A set of basic test problems of 
between four and ten products is used for detailed inter- 


and intra-method testing. Results of these tests are 
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reported together in the following sections. Three 
competitive methods are tested on a final 100 commodity test 
problem in the last section to illustrate the effectiveness 


of these methods on a typical, large-scale problem. 
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Figure 4.1 Simplified Ammunition Distribution Model 


The scope of our computational study is limited to 
problems drawn from this one generic model, but we think 
that these problems are generally quite difficult to solve. 
Because of the explicit system of, penalized artificials, the 
costs range over five orders of magnitude, which have the 


potential to cause difficulty in real arithmetic. Also, 
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because the model has multiple shipping modes, covers nine 
spatial echelons, 21 time echelons, and has a large 
backlogging system, repair of joint capacity violations 
requires extensive effort. Therefore, these test problems 
present a rigorous challenge for the solution algorithms. 

We have graded the basic test problems used as easy (E) 
or hard (H) based on two criteria. The first criterion is 
the number oof violations encountered in the first 
relaxation. This parallels the criterion of Ali, et al. 
(1984) in evaluating direct simplex methods used on some 
medium-sized MCTPs: the number of tight joint capacity 
constraints. The second criterion is the magnitude of the 
initial violations, which is an indirect measure of the flow 
changes required to reach the optimal solution. Together 
these criteria affect the size of the gap between the 
initial finite upper and lower bounds. This is evident in 
the RSD and RDLB procedures, where the initial feasible 
solutions are based upon allocations from the first 
relaxation: the larger the violations, the poorer the 
initial allocations. Among the problems we test, the most 
difficult is 10H, on which RSD and RDLB generate a 56% 
initial gap. 

Test problem specifications and direct solution times by 
the X-system are presented in Table 4.1. Initial gaps are 


from the RSD algorithm. 
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TABLE 4.1 


OPTIMAL SOLUTIONS FOR BASIC TEST PROBLEMS 


PROD. INIT. OPTIMAL SOL'N INIT. LARGEST A-SYSTHM 


GRADE GAP VIOL. VWIOEey TIME (SEC*) 
% ARC GRP. CODD HOT 
4E il 340,916,981 3 844/1458 1021 
10E ay 367,135,103 11 4,998/10,500 2586 
4H 14 130,739,585 9 7,185/10,500 461.7 496.5 
6H 35 138,525,451 13 10,048/10,500 3015.7 581.0 
8H 45 143,526,951 15 13,861/10,500 2431.3 1067.6 
10H 54 169,532,339 20 13,861/10,500 5352 1393.5 


(*IBM 3081K) 


The 4-commodity problems have approximately 1321616 
constraints and 41,600 variables: the 10-commodity problems 
have about 33,000 constraints and 104,000 variables. Hot 
start solutions use a pure network basis crash, and all 
these solutions factor the joint capacitation constraints as 
generalized upper bounds. However, pure network 
factorization is not employed. 

The principal motive for these runs is to establish 
completely optimal yardsticks for the competing indirect 


methods. 
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B. INTRA-ALGORITHM STUDIES 

We first discuss the selection of parameter settings for 
the RSD algorithms, and then' we discuss issues involving 
DDC, including obtaining bounded master problem solutions 
and cut dropping. For RSD(P), the parameters of interest 
are the initial value of h, the h multiplier a, and the 
penalty exponent, t. For RSD(A), we examine h and a, fixing 
the exponent at 2.0. In our tests, initial values for h are 
taken as a fraction of the largest cost used in the network, 
in this case a bypass penalty cost of 62,700. Values tested 
in RSD(P) were 627.0, 62.7, and 6.27. Other researchers 
(e.g., Bertsekas, 1982) suggest penalty multipliers in the 
range of 4 to 10 when solving the problem optimally for each 
value of h. However, since we increase h frequently based 
on periodic reallocations and lower bound adjustments, 
penalty multiplier values between 55 and 4 are 
investigated. Finally, to represent the range of possible 
exponents we test the algorithm for t = 1.1, 1.5 and 2. The 
exponent t = 1.0 is not useful in this algorithm since it 
results in uj4 = h for all j in Jy at every iteration. 

Table 4.2 summarizes response to changes in the size of 
the penalty multiplier in RSD(P). The results indicate the 
upper and lower bounds and final gap based on 21 iterations 
of the algorithm for problems 4H and 10H. A multiplier of 


1.5 seems to work best in the smaller problem, while 4.0 


oo 


TABLE 4.2 


RESPONSE TO CHANGING PENALTY MULTIPLIER 


PROD. MUTT. INIT EXP LB UB GAP 
a h Le 10 10 % 
4H ZO Giz 7 130.4 130.74 - 208 
Sr AT 13i0). 67 130e75 . Olio 
ZpeiG) 6.27 Pee. / 130.74 7a 
Pe G27 diese) L3On2 1310” 76 ~44 
10H Gres. / 162 Lao. 1 4.79 
65.27 163.4 17046 4.22 
4. Gare / L50 «2 170.1 lle 
Ge 27 T6OS mo. 8 5 we 
Gia. 7 SO wee 172 So. 
: Ges 7 ISG) aa, 171.2 10.8 
G2. 7 ive Fea 9 170.6 3.98 
; 62557 f. Pe 7 Om} 4.91 


generally works best in the larger problem. However, in 


both cases, the response is not substantially worse for a 
multiplier of 2.0. Therefore, we fix the multiplier @e weg 
and concentrate further studies on the other two parameters. 

Recalling that in the subproblems the gradient of the 
penalty term provides an estimate of the optimal Lagrange 


multipliers, 
uyX = ne (axk-b,)+)t-1 


it is evident that h and t work ‘in concert to affect the 
magnitude of the multiplier estimates at each iteration. We 


want a combination which produces multiplier estimates large 
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enough to generate new extreme points, yet small enough to 
allow good lower bound estimates. That is, as uz kb, grows 
iLek@e (=; Vv (u,*) degrades as a lower bound estimate. Table 4.3 
and Figure 4.2 show the response of RSD(P) to changes in h 
and t. Results are for 21 iterations of the algorithm, with 
the penalty multiplier fixed at 2.0 for all cases. In each 
case we retain a maximum of seven extreme points plus the 
current master problem solution, and perform ae primal 
resource allocation every seventh iteration. 

We summarize the computational results as follows. 
First, there is a distinct advantage to starting with a 
small penalty parameter. Initial values of h = (max. arc 
cost) x 1073 or 1074 provide superior overall performance in 
the test problems. Second, the penalty exponent t = 1.1 
shows a definite overall disadvantage to values of 1.5 and 
2.0. Both t = 1.5 and t = 2 perform well for an appropriate 
matching value of h, but we favor t = 1.5 because it 
provides the best overall final results, apparently reducing 
the dominance of the most grossly violated arcs when 
uy = h((A5x-b,)*)+5 is computed. 

Figure 4.2 graphically displays the interactions between 
h and t in RSD(P) by depicting upper and lower bounds 
relative to the optimal solution of problem 10H. The heavy 
black lines suggest that the best response is with h = 62.7, 


t = 1.5, but it appears that any choice of 0 < h < 70 and 
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Figure 4.2 Response to Changing Parameters, 
RSD(P) on Problem 10H 


1.5 < t < 2 will perform well. Note that the upper bound is 
nearly optimal throughout the suggested range. 

The parametric responses reported here suggest a simple 
method for selecting appropriate starting parameters for the 
penalty algorithn. First, we choose the h multiplier a = 
200 @nd @@F 1.5 = t < 2.0. Then we make an estimate, W235) 
of the optimal dual variable associated with the most 
Violated constraint found when LR(0) 1s solved, and select h 


so that 
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Better guesses of uj, evidently evoke better performance of 
the algorithm. In this case, we simply use a fraction of 
the largest cost as our estimate; more sophisticated 
estimates may be made by examining differences in costs 
between least and most expensive paths through the network, 
on by solving a Single problem with aggregated 
multicommodity supplies and demands. One point is clear 
from the data: it is better to underestimate the best 
initial value for h than to overestimate, because the 
algorithm is quickly self-correcting for low estimates. 
That is, if the initial penalty is so small that it does not 
produce a new extreme point, the algorithm immediately (and 
repeatedly) increases the penalty parameter until it does. 
Furthermore, when the initial multiplier estimates are good, 
the subproblems generate extreme points more closely related 
to an optimal solution. Consequently, both lower bounds and 
primal solutions quickly benefit. On the other hand, 
excessively large penalties generate extreme points with 
little relation to the optimal solution, producing poor 
lower bounds and degrading the primal solutions obtained 
through resource allocations. 

Figure 4.3 shows the interactions of parameters for 
RSD(A). In this case, we use a constant penalty exponent t 
= 2.0, and vary h and a. As indicated by the response 


surfaces in the figure, the best bounds are obtained for a 
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combination of small initial penalty parameters and smaller 
multipliers. We fix the multiplier at 1.2 in subsequent 
tests. Again, the initial value of h may be calculated from 


au, estimate using 

h x [max (Ajx°-bj4) 7} ig W153 , 
and the algorithm quickly corrects for underestimates but 
recovers slowiy from overestimates. 

The computational experience presented by Hearn et al. 
(1984) indicates that performance improves as the number of 
retained extreme points is increased. In fact, if enough 
points are retained, theoretically the heuristic becomes a 
finite algorithm. Performance in our tests improves for up 
to about eight points, and then levels off. As the number 
of retained points increases, the time spent in the master 
problem increases. In the following section, we fix the 
number of retained points at eight and obtain quite 
satisfactory performance. 

Four performance issues concern us with algorithm DDC. 

The first is reaching a bounded condition in the dual 
master problem (or, equivalently, reaching a feasible 
solution to the Dantzig-Wolfe master problem). Since the 
constraint set of the dual master problem may be viewed as 
forming a piecewise linear tangential approximation of the 


Lagrangean dual function, we obtain no feasible primal 
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Figure 4.3 Response to Changing Parameters, 
RSD(A) on Problem 4H 


solution until the set is bounded. In problem 10H, for 
instance, this requires 6 iterations. 

One method to obtain early boundedness is to perform a 
resource allocation from the first relaxation, as in the RSD 
algorithms, and append this information to the DDC master 
problem as a "pseudocut." This method does provide an early 
feasible solution and the first pseudocut seems to remain 
binding for many iterations. However, our results show that 


it does not improve the point at which boundedness is 
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naturally achieved in the master problem using cuts 
generated by the subproblems, nor does it affect the 
subsequent convergence trajectory. 

Second, we must make an intelligent choice of the 
decomposition tolerance e in the cut thresholds ubte. e 
acts as a relaxation parameter. If e is too small, we 
restrict the size of our improvements in the dual 
multipliers and inch uphill; if e is too large, we overshoot 
good multiplier choices and may oscillate through a sequence 
of extreme points which generate poor cuts and consequently 
poor dual solutions. Moderation is a virtue in the choice 
of e. For this set of test problems, we initially set e = 
100,000 since no dual variable is expected to exceed 62,700 
and obtain reasonable performance by reducing e by half 
whenever e-optimality is achieved (1.e., whenever the cuts 
cannot be satisfied). A final value er terminates this 
sequence of master problem relaxations. 

4 guakiate the decomposition goals are a daunting 
complication at first glance. However, a goal constraint 
for each master problem variable can be incorporated as a 
generalized upper bound, and the resulting master problem 
solved with very little additional effort. 

Finally, we must consider the effects of accumulating 
too many cuts in the master problem. As the number of 
retained cuts grows, more time and space are required to 


solve the master, and more time is required to regenerate 


oy 


solutions from peripheral storage. However as the 
approximation to the dual response surface improves, many of 
the earlier cuts become non-binding, either temporarily or 
permanently. Figures 4.4 and 4.5 show cut histories for 
problems 4H and 10H, respectively. In these figures, the 
height of the "skyscraper" indicates the weight of the cuts 
at each iteration. Both results show that only a few cuts 
reenter a subsequent solution after once becoming slack, and 


then only at a small dual weight. 
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Figure 4.4 Active Cuts at Each Iteration 
DDC on Problem 4H 
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We take advantage of this property by restricting the 
number of cuts retained to Npoiq = 20. We also use an 
exponential moving average of the previous cut weightings 
(master problem dual variables) to determine which cuts to 
bee} eis In the (unlikely) event that a taut cut must be 
replaced, the (upper bound) value of a recoverable primal 


solution must be relaxed accordingly. 
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Figure 4.5 Active Cuts at Each Iteration 
DDC on Problem 10H 
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The master problems are best solved by using a basis 
crash from the preceding solution in the sequence (except in 
the rare case that a taut cut has been replaced). 

For our problems, the master problems have as many as 
Nhold = 20 (dense) cut constraints, and 1,071 decomposition 
goal (GUB) constraints, one for each of the 1,071 variables. 
Master problem solution times average 0.2 seconds (IBM 3033 
AP). 

The pure network subproblems have 3,300 nodes and 10,400 
arcs, and solve in about 1 second per commodity. To reduce 
these dominating computation times, a hot start mechanism is 
used which initially restricts the network to those 
variables known to have had positive flow in any prior 
solution for that commodity. This restriction tends to 
reduce solution time as more experience iS gained over 
iterations. After about 10 cuts, network subproblems rarely 
use any new arc not used before. 

The networks subproblems are much more expensive to 
solve in DDC than the LP master problems. This is reversed 


from experience with primal decomposition, for which the 


master problems are commonly mixed integer LPs (e.g., see 
Geoffrion and Graves, 1974, or Brown, Graves and 
Honczarenko, 1983). Unfortunately, further mechanisms to 


reduce network solution times require additional storage, 
and we have limited our implementation to operating with a 


default 2 M-byte VM/CMS region. Overlays are used for each 
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commodity subproblem and the master problem. We do not want 
to limit the number of commodities and have therefore used 


no data structure spanning commodities. 
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C. INTERMETHOD COMPARISONS 

We now turn our attention to comparisons among the 
various algorithms. We establish acceptable solution gaps 
of 1% and 4% for the four and ten product problems, 
respectively, for two reasons. First, for such large-scale 
planning models, these levels of accuracy are adequate. 
Second, in order to maintain integer arithmetic in the 
network problems, we round multiplier estimates and capacity 
allocations, inducing the possibility that exact optima to 
the original problem have been excluded. 

Figures 4.6 and 4.7 show the significant computational 


advantage of RSD(A) over RDLB. For both test problems 4H 
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Figure 4.6 Solution Trajectory Comparison (Problem 4H) 
RSD(A) vs. RDLB 
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Figure 4.7 Solution Trajectory Comparison (Problem 10H) 
RSD(A) vs. RDLB 


and 10H, the performance of the RDLB is much worse than 
RSD(A) because the subgradient reallocations are generally 
ineffective. In fact, although the subgradient realloca- 
tions lead to an optimal solution for problem 4E (not shown) 
no improvements are found for 10H, and improvements totaling 
.007% out of an 11.8% gap are made in 4H, only by using an 
exceedingly small step size. 

The poor primal performance of RDLB 1S compounded, by 
slow convergence of the Lagrangean problem. As indicated by 


the figures, it usually requires accumulation of three to 


iQ 


five new subgradients before there is enough information 
available to £ ne a good conjugate direction and 
substantially improve the lower bound. Since only a few 
costs are typically changed in solving the second set of 
network subproblems for the quadratic approximation to the 
line search, we use the previous bases as a "hot restart," 
but the time savings is not adequate to make the algorithm 
competitive in performance with the other algorithms. Even 
though other lower bounding strategieS are available for 
testing, no further tests are performed on this algorithm 
because of its combined poor performance on both lower and 
upper bounds. 

For the two test problems shown, the RDLB' method 
requires significantly longer to do less. The algorithm 
terminates with final gaps of 5.83 and 10.7 percent, both 
unacceptable by our stated standard. The primal bound is 
relatively poor in both problems, corroborating the results 
of other researchers (e.g., Allen (1985)) which’ show 
subgradient-based resource direction unable to achieve an 
optimal solution. 

We now compare RSD(A) to RSD(P) (t = 1.5). In Figure 
4.8, RSD(A) takes about 60 seconds on problem 4H to reach a 
solution within .15% of optimal, while RSD(P) reaches a .69% 
solution in under 120 seconds. In Figure 4.9, the augmented 
form reaches a 4% solution to problem 10H in about 270 


seconds and a 3.56% solution after 21 iterations in about 
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Figure 4.8 Solution Trajectory Comparison 
RSD(A) vs. RSD(P), Problem 4H 


440 seconds. The penalty version takes nearly 470 seconds 
to reach the 4% level, closing at 3.93%. The X-system on an 
IBM 3081K solves this problem optimally in 1393.5 seconds 
using the network hot-start procedure (5,352 seconds 
without!). Considering the difference in computer speeds, 
RSD(A) reaches an acceptable solution in 1/10 the time of 
the X-system. 

In comparing RSD(A) and RSD(P), we note not so much the 


difference in performance times as the movement of the lower 
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bounds. The bound for RSD(A) climbs more quickly than 


RSD(P) apparently due to better instantaneous multiplier 
estimates and the fact that h is increased in smaller steps 
in RSD(A), resulting in a better-conditioned master problem 
at each iteration. Complete convergence of bounds is not 
anticipated for either algorithm, because we are rounding 
the multiplier estimates to perform integer arithmetic in 

the subproblems, we are not scaling the subproblems, and we 


are retaining only eight points. However, the solutions 
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obtained by both algorithms are adequate, especially since 


the primal incumbents invariably seem to be very near 


optimal. 

Finally, Figures 4.10 and 4.11 compare the performance 
of RSD(A) to the performance of DDC. For both problems 4H 
and 10H, the RSD(A) algorithm significantly outperforms the 
dual algorithm. While RSD takes about 60 seconds to reach a 


solution within .15% of optimal in 4H, DDC required 160 


seconds to reach a 1% solution and about 180 seconds to 


reach a gap comparable to RSD(A). 
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Figure 4.10 Solution Trajectory Comparison 
RSD(A) vs. DDC, Problem 4H 
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Figure 4.11 Solution Trajectory Comparison 
RSD(A) vs. DDC, Problem 10H 
In 10H, the comparison 1S Similar. RSD(A) reaches a 4% 


solution in about 180 seconds and 3.56% in 440 seconds. DDC 
terminates at 650 seconds after 50 iterations with an 8.5% 
gap. These results are especially significant since the 
dual decomposition algorithm has a reputation for rapid 
initial progress. It is evident from both test problems 
that the RSD(A) algorithm performed better both on the upper 


and lower bounds. 
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Table 4.4 summarizes the final results for all 
algorithms and basic test problems run on an IBM 3033AP. 
Notice that RDLB runs reasonably well on 4E, but its times 
and solution quality are unacceptable for the hard problems. 
On the other hand, RSD(A) produces near-optimal primal 
solutions, acceptable bounds, and fast times for each test 
problem. The dual decomposition produces good _ final 
results, but through a poorer trajectory, and shows signs of 
laboring on problem 10H. 

Finally, we construct a problem of 100 products to test 
DDC and RSD(A). It has 31 initial capacity violations with 
a maximum violation of 238% of arc capacity. Due to the 
increase in problem size, the initial value of h is reduced 
by a factor of 10 in RSD(A): no other changes have been 
made. Figure 4.12 shows that RSD(A) reaches an acceptable 
4% solution in about 1000 seconds and concludes 21 
iterations in 3000 seconds with a 1.5% gap. DDC terminates 
at 4090 seconds and 50 iterations with a 12.15% gap 
remaining. We note that a previous effort on this same 
problem using a resource-directive algorithm achieved an 
11.8% solution in 1000 seconds (Staniec, 1984), but made no 
further improvements in an hour of computation. The RSD(A) 
algorithm achieved an acceptable solution in less than 17 
minutes for a -problem of some 330,000 constraints and 


1,040,000 variables. 
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on 100 Commodities 
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V. CONCLUSION 


We have presented three algorithms for solving large- 
scale linear programs that fall within the general framework 
of decomposition, with specific application to the MCTP. 

The first algorithm, a resource-directive decomposition, 
has contributed a new, simplified method of projecting 
subgradient reallocations in the primal problems, and an 
improved method for terminating the algorithm via conjugate 
subgradient directions and approximate line searches in the 
Lagrangean lower bounding routine. However, the method is 
not computationally effective due to inability to find 
improving subgradient reallocations and due to computational 
burden in the line searches. 

The second algorithm is a dual decomposition using a 
master problem which is the relaxed dual of the standard 
Dantzig-Wolfe master problem. Tests show the method to be 
computationally effective, but with slow convergence for 
very large problems. 

The last algorithm (with two variants) iS a new, non- 
linear price-directive decomposition using a penalty 
transformation of the original problem. Computational tests 
show the method to be approximately ten times faster than a 
direct simplex-based algorithm, and 2-3 times faster than 


the dual decomposition tested. 
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The initial success of the RSD algorithm prompts several 
areas for further investigation. First, the test suite is 
limited. We intend to test the algorithm on other large- 
scale problems 

Second, Hearn et al. (1984) suggest that quadratic 
approximations to the master problem may speed its solution 
significantly without degrading convergence rate. Bertsekas 
(1982a) has described a projected Newton method for solving 
non-linear objectives in the presence of simple constraints. 
We propose development of a form of the Bertsekas algorithm 
which is capable of handling the penalty transformation of 
PerP. 

Third, it 1S reasonable to assume other non-linear 
objectives may have attractive features. For instance, the 
logarithmic barrier function has recently been the subject 
of extensive research for solving linear programs (see, for 
instance, Gill et al. (1986) or Karmarkar (1984)). We 
propose investigating interior point forms of the algorithn, 
with the possibility OF developing a hybrid 
interior/exterior point algorithm, taking advantage of both 
penalty and barrier theory to solve linear programs. 

Fourth, we propose a hybrid algorithm using’ the 
augmented Lagrangean form of RSD combined with dual 
decomposition to take advantage of the best properties of 
both algorithms. Via RSD, we quickly generate good 


estimates of the optimal dual multipliers, and therefore 


Yas 


favorable extreme points. However, convergence slows due to 
the restricted number oof extreme _ points retained. 
Increasing the extreme point count equates to increasing the 
time spent solving the master problem. However, these 
favorable extreme points may all be used to produce cuts for 
the dual decomposition master problem, which can be solved 
more quickly and precisely via linear programming. We 
anticipate the result to be a hybrid algorithm which has 
both favorable initial and terminal convergence properties. 

Finally, we consider an extension of the algorithm from 
the shipment planning to a shipment scheduling framework, in 
which, for instance, we must make binary decisions on sea 
shipment arcs to account for shiploads of material. This 
will entail surrounding the RSD algorithm with a shell 
capable of interpreting information to make binary 
selections. One possible way to accomplish this is to 
develop a specialized version of the cross-decomposition 
algorithm of Van Roy (1986) which incorporates RSD. 

In conclusion, the result of this research has been a 
new, computationally attractive algorithm for large scale 
linear programs using non-linear methods. Also, through the 
computational results produced, it has documented some of 
the shortcomings of the subgradient approach in resource 
direction. It is evident that more information (e.g., a 
formal Benders master problem) is required to _ solve 


aifficult problems by resource direction. 
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APPENDIX 


CONVEXITY OF THE QUADRATIC PENALTY FUNCTION 


The following lemma demonstrates that the quadratic 
penalty form of the MCTP is everywhere convex, which makes 
it appropriate for the application of RSD. Convexity of 


other penalty forms may be proved in a Similar manner. 


Lemma: The EVNGe LON, Guay, x) is convex 16a all 
x Fo= ({(xXiNx = 5,0 = x < b7}. 
ETO tL ; q(h,x) = cx + P(h,x), where 
P(h,x) = Sh (Ax-by) +] '(Ax-b,)*. Since cx is 


trivially convex and the sum of convex functions is 
convex, we need only show P(h,x) to be convex; that 


is, that 


P(h,x) > P(h,x) + VyP(h,x)(x-x), for all x « F 


where VP,(h,x) = h[(Ax-b1)T]'A 
Then we define Jy = (j |A;x-b}- > 0} and revert to 


Summation form. P(h,x) is convex if 


; = be 2 
) =h[(Ajx-b,5)*)? > } sh (Ajx-b 5)? + YY (A3x-b5) Ag (x-x), 
7 Jedy JeTy 


1a 


Rewriting the left side as 


h[(A3x-b,J)+]2 = J We (Ajx-by5)7]2 + J nlayx-b5)*)? 
JeIdy Jsédy 


and noting that ) h[ (Ajx-b14)7]? > 0 for all & 
jy 


we May prove convexity if 


) - +42 WV -" 2 = — 
| ee Baba aa Raglan La (Gh(Ajx-by 3)? + h(A3X-bz 5) Ay (x-X) } 
J *¥V Jevy 


or, considering additivity of convex functions, if 


1 Jt a _ - 
3 hl (A3x-b,4) 7) > Zh[ (A;x-b,4) 7]? + h (A4x-b 14) 7A; (xXx-xX) 
for allexae<ef, ands jie aie 
If (A;x-by;)* = 0, the right side of the 
inequality is 0, and h[ (Ajx-by4) 7]? > 0 trivial 
Now, for (Ax-b,4)* > 0, letting uj; = A4x-b]q, 
dropping h we write 


_— = 2 1— a oa e 
HA;X-bz3)? + (A}x—bz5) Ag (HOR) = Uy (A}X-bzj) + U5 (A}XHAQX) 
a — - 
= uy [A4x-(b15)+4A5x-b]) -A5x+ (675) ] 
2 1 - 
aes [A4x-bj 5 = 7 (A5X-b15)) 
ae 


- 1 
—eee) (A4x-bj5) - 7uU; 


i 
Convexity is proved if 7 [ (A3x-b35) 7)? > Uy (Ajx- 


aig) = =us?, Two cases occur: 
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ee NO a ae 
For all x | (A5x by3) = 0, then Ajx b15 < 0 and we 


= in 
have = ( (Ayx-bz5) 7)? = 0 > uj(Ajx-b145) - 7 us? = 
{ (Ajx-by4)*)% + (A¥X-b5) *Ag (xX) - 


and note (us -u5) 2 = On 
Jeibhe (u4-u4)- = u4? = 2uju4 zt us? > 0, and us? oa 
us 4 + 2U4-u5, so multiplying by h and restoring 


values, we have 
1 nl (Asx-b,+)7)2 >t ne (as¥-b,5)7)]2 + h(Asx-b1=) tAs (x-x) 
Dp J Uy =) J WW J ita) i) 


Since Ps (h,xX) 1s convex for all j and the summation 
of convex functions iS convex, q(h,x) is convex. 


QED 
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